WO2022160206A1 - 一种片上系统异常处理方法、片上系统及其装置 - Google Patents
一种片上系统异常处理方法、片上系统及其装置 Download PDFInfo
- Publication number
- WO2022160206A1 WO2022160206A1 PCT/CN2021/074235 CN2021074235W WO2022160206A1 WO 2022160206 A1 WO2022160206 A1 WO 2022160206A1 CN 2021074235 W CN2021074235 W CN 2021074235W WO 2022160206 A1 WO2022160206 A1 WO 2022160206A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data processing
- processing request
- cache
- cache information
- information
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title abstract description 3
- 230000005856 abnormality Effects 0.000 title abstract 2
- 238000012545 processing Methods 0.000 claims abstract description 730
- 230000004044 response Effects 0.000 claims abstract description 91
- 238000000034 method Methods 0.000 claims abstract description 84
- 238000012544 monitoring process Methods 0.000 claims abstract description 33
- 238000004891 communication Methods 0.000 claims description 15
- 230000002159 abnormal effect Effects 0.000 claims description 14
- 230000000717 retained effect Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- COCAUCFPFHUGAA-MGNBDDOMSA-N n-[3-[(1s,7s)-5-amino-4-thia-6-azabicyclo[5.1.0]oct-5-en-7-yl]-4-fluorophenyl]-5-chloropyridine-2-carboxamide Chemical compound C=1C=C(F)C([C@@]23N=C(SCC[C@@H]2C3)N)=CC=1NC(=O)C1=CC=C(Cl)C=N1 COCAUCFPFHUGAA-MGNBDDOMSA-N 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
Definitions
- the present application relates to the field of chip technology, and in particular, to a system-on-chip exception handling method, a system-on-chip and a device thereof.
- SoC system on chip
- NoC network on chip
- the request of the master device may not be processed normally, and the request has been in a waiting state, causing the system to hang up.
- the request of the master device (master) or the request of other master devices None of the requests work properly, affecting the overall stability and reliability of the NoC.
- Embodiments of the present application provide a system-on-chip exception processing method, a system-on-chip, and a device thereof, which are used to solve the system-on-chip exception caused by data processing request timeout, so as to improve the stability and reliability of the system-on-chip.
- a system-on-chip exception handling method including:
- the operation of the first data processing request when the operation of the first data processing request times out, it is sent to the virtual slave device for processing, and the virtual slave device returns a processing response, thereby ending the processing operation time-out first data processing request in time to avoid the processing operation.
- the time-out first data processing request occupies system resources for a long time and causes the system to hang up, thereby ensuring the stability and reliability of the system-on-chip.
- the method further includes: if a processing response from the target slave device is received before the processing response to the first data processing request returned by the virtual slave device is received, discarding the processing response from the target slave device.
- the target slave device after the first data processing request is sent to the virtual slave device for processing, before receiving the processing response returned by the virtual slave device, if the processing response returned by the target slave device is received first, the target slave device will be returned. The processing response is discarded to avoid conflicts with the processing of the virtual slave.
- the method further includes:
- first cache information includes information of the first data processing request
- the method further includes:
- the method further includes:
- the first cache information is retained, and the cache resources occupied by the second cache information are released; or the second cache information is retained, and the cache resources occupied by the first cache information are released.
- the first data processing request or the second data processing request is released.
- the cache resources occupied by the cache information corresponding to the second data processing request so that among the data processing requests from the same master device and the same thread, if the processing operation of one of the data processing requests satisfies the timeout condition, only one of the data processing requests will be reserved for processing.
- the cache resource occupied by the request so that when the data processing request from the same thread of the same master device times out, the cache resource occupied by the data processing request from the same source is reduced.
- it also includes: sending the second data processing request to the virtual slave device.
- a processing response from the target slave device of the second data processing request is received before receiving the processing response of the second data processing request returned by the virtual slave device, discard the processing response from the target slave device of the second data processing request.
- the processing response of the target slave device to the second data processing request is received before receiving the processing response of the second data processing request returned by the virtual slave device.
- the method further includes:
- third cache information is generated, and the third cache information includes the information of the third data processing request.
- the cache resources occupied by the third cache information are the same as the cache resources occupied by the first cache information.
- the method further includes:
- third cache information is generated, and the third cache information includes the information of the third data processing request.
- the cache resources occupied by the third cache information are the same as the cache resources occupied by the second cache information.
- the cache resource caches the information of the third data processing request, so that when the data processing request of the source times out and the processing has not been completed (that is, the cache resource occupied by the corresponding cache information has not been released), the new data of the source can be processed.
- no new cache resources are allocated, so as to reduce the overhead of the cache resources of the on-chip system, thereby ensuring the reliability and stability of the on-chip system.
- the method further includes:
- Incrementing or decrementing the count value of the common counter wherein each time a data processing request is received, the count value of the common counter is increased or decreased, and when the count value of the common counter overflows, the common counter is reset;
- it also includes: after all received data processing requests are processed, resetting the common counter. When all the received data processing requests are processed and there is no buffer information, the public counter is reset, so that the received data processing requests can be counted correctly after the data processing requests are received again.
- the method further includes: when monitoring that the processing operation of the first data processing request satisfies the timeout condition, performing at least one of the following processing operations: reporting a timeout interrupt event, where the timeout interrupt event carries the information of the first data processing request; save the information of the first data processing request in the system abnormal event log.
- the reported timeout interrupt event may include relevant information (such as context) of the data processing request that has timed out, so that the abnormal data processing request can be indicated and processed in a targeted manner.
- a system on a chip including: an ingress processing unit, a bus, and an egress processing unit, wherein the ingress processing unit includes a virtual slave device;
- the ingress processing unit is configured to receive the first data processing request from the master device; monitor whether the processing operation of the first data processing request satisfies the timeout condition; when it is detected that the processing operation of the first data processing request satisfies the When a timeout condition occurs, sending the first data processing request to the virtual slave device to trigger the virtual slave device to return a processing response to the first data processing request;
- the bus configured to route the first data processing request received by the ingress processing unit to the egress processing unit;
- the exit processing unit is configured to send the first data processing request to the target slave device.
- the ingress processing unit is further configured to: if before receiving the processing response of the first data processing request returned by the virtual slave device, receive a response from the target slave device The processing response from the target slave device is discarded.
- the ingress processing unit is further configured to:
- first cache information After receiving the first data processing request from the master device, generate first cache information, where the first cache information includes information of the first data processing request;
- the ingress processing unit is also used for:
- the ingress processing unit is also used for:
- the method further includes:
- the first cache information is retained, and the cache resources occupied by the second cache information are released; or the second cache information is retained, and the cache resources occupied by the first cache information are released.
- the ingress processing unit is further configured to: send the second data processing request to the virtual slave device.
- the ingress processing unit is further configured to:
- third cache information is generated, and the third cache information includes the information of the third data processing request.
- the cache resources occupied by the third cache information are the same as the cache resources occupied by the first cache information.
- the ingress processing unit is further configured to:
- third cache information is generated, and the third cache information includes the information of the third data processing request.
- the cache resources occupied by the third cache information are the same as the cache resources occupied by the second cache information.
- the ingress processing unit is further configured to:
- the master device After receiving the first data processing request from the master device, increment or decrement the count value of the common counter, wherein each time a data processing request is received, the count value of the common counter is incremented or decremented, the common counter When the count value overflows, the public counter is reset;
- the ingress processing unit is further configured to reset the common counter after all received data processing requests are processed.
- the ingress processing unit is further configured to:
- timeout interrupt event carries the information of the first data processing request
- the information of the first data processing request is saved in the system abnormal event log.
- a chip is provided, the chip is coupled with a memory, and is used for reading and executing program instructions stored in the memory, so as to implement the method according to any one of the first aspects.
- a communication device comprising at least one processor, the at least one processor is connected to a memory, and the at least one processor is configured to read and execute a program stored in the memory, so as to enable the communication
- the apparatus performs the method of any one of the first aspects.
- a computer storage medium stores computer instructions that, when executed on a computer, cause the computer to perform the method according to any one of the first aspects.
- a computer product which, when invoked by a computer, causes the computer to execute the method according to any one of the first aspects.
- FIG. 1 is a schematic diagram of a system-on-chip architecture in an embodiment of the present application
- FIGS. 2a and 2b are schematic diagrams of connection of an ingress processing unit, a bus, and an egress processing unit in the system-on-chip according to the embodiment of the present application, respectively;
- FIG. 3 is a schematic structural diagram of an ingress processing unit in a system-on-chip according to an embodiment of the present application
- FIG. 4 is a schematic diagram of the principle of implementing timeout monitoring and exception handling by an ingress processing unit in an embodiment of the present application
- FIG. 5 is a schematic flowchart of a system-on-chip exception handling method provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of exception handling in a system-on-chip provided by another embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a communication apparatus provided by an embodiment of the present application.
- system and “network” in the embodiments of the present application may be used interchangeably.
- “Plurality” refers to two or more than two, and in view of this, “plurality” may also be understood as “at least two” in the embodiments of the present application.
- “At least one” can be understood as one or more, such as one, two or more. For example, including at least one refers to including one, two or more, and does not limit which ones are included. For example, including at least one of A, B, and C, then including A, B, C, A and B, A and C, B and C, or A and B and C.
- ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, sequence, priority, or importance of multiple objects.
- FIG. 1 it is a schematic diagram of the architecture of a system-on-chip in an embodiment of the present application.
- the system-on-chip 100 is connected with one or more master devices (also called masters), such as master device 1, master device 2, and master device N as shown in the figure; the system-on-chip 100 is connected with one or more slave devices Devices (also called slaves) are connected, such as slave device 1, slave device 2, and slave device M as shown in the figure.
- master devices also called masters
- slave devices Devices also called slaves
- the host device may include a processor, such as a CPU, a graphics processing unit (GPU), an image signal processing (ISP), and the like.
- a slave device may include memory, a memory controller, etc., such as a memory controller.
- the master device initiates a data processing request, such as a read data request or a write data request or other types of data processing requests, and the data processing request is transmitted to the target slave device through the on-chip system for processing.
- the processor sends a read data request
- the on-chip system transmits the read data request (including the address) to the memory controller, and then waits for the memory controller to read the data request. respond.
- the memory controller sends the data read from the memory particle to the system-on-chip according to the address, and the system-on-chip transmits the data to the processor. If the processor receives the data and verifies that the data is correct (for example, ECC or parity check is correct), the read operation is completed.
- the data for example, ECC or parity check is correct
- the system-on-chip transmits the write data request (including the address) and the data to the memory controller, and the memory controller writes the data to the memory particles according to the address, and replies after the above write operation is completed.
- Confirmation information the confirmation information is returned to the processor by the SoC, and the write operation is completed.
- the system on chip 100 may include an ingress processing unit 200 , a bus 300 and an egress processing unit 400 .
- the ingress processing unit may also be referred to as an ingress bridge (initial bridge, IB), and the egress processing unit may also be referred to as an egress bridge (target bridge, TB).
- the ingress processing unit 200, the bus 300 and the egress processing unit 400 may be implemented by using a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC).
- FPGA field programmable gate array
- ASIC application specific integrated circuit
- the entry processing unit 200 is used to receive a data processing request from the master device, and further perform protocol conversion, so as to convert the received data processing request into a format that meets the internal processing requirements of the system-on-chip, and further can also execute a security policy (That is, authentication) and other processing. Further, in this embodiment of the present application, the ingress processing unit 200 may implement one or more functions such as timeout monitoring, interrupt reporting, exception recording, and exception handling, and the specific implementation can refer to the following description.
- the export processing unit 400 is configured to send a data processing request to the slave device, and can further perform protocol conversion on the data processing request to be sent to the slave device, so as to convert the data processing request into a format that meets the requirements of the slave device.
- Bus 300 is used to route data processing requests from master devices to target slave devices.
- the bus 300 may also be referred to as a switched network.
- the bus 300 may include a plurality of routing units (routers) to form an interconnection structure of the plurality of routing units.
- routing units routing units
- the master device and the slave device may also be used as components of the system-on-chip, which is not limited in this embodiment of the present application.
- the timeout monitoring processing operation performed by the ingress processing unit may include:
- the data processing request may be any data processing request from the master device, such as the first data processing request, the second data processing request, or the third data processing request involved in the embodiments of the present application.
- the ingress processing unit may use a two-level counter to implement timeout monitoring, and the specific implementation process may refer to FIG. 4 and related descriptions.
- other methods may also be used to monitor whether the processing operation of the data processing request times out.
- a timer is used to time the processing operation of the data processing request. When the timing time reaches a set threshold, it can be determined that The processing operation of the corresponding data processing request satisfies the timeout condition.
- the exception handling operation performed by the entry processing unit may include:
- the first data processing request is sent to the virtual slave device to trigger the virtual slave device to return a processing response to the first data processing request. Further, the processing response of the first data processing request returned by the virtual slave device may be sent to the sender master device of the first data processing request, so as to complete the processing operation of the first data processing request.
- the first data processing request may be marked with a timeout to indicate that the first data processing request is a data processing request with a processing operation timeout, which is to be The marked first data processing request is sent to the virtual slave device for response.
- the virtual slave device may generate a processing failure response for the first data processing request whose processing operation times out, and the processing failure response may be sent to the master device that sends the data processing request.
- the virtual slave device may perform processing and generate a processing response that may be sent to the sender master device of the data processing request.
- the virtual slave device may return a data write failure response after receiving the write data request.
- the virtual slave device may return a data read failure response after receiving the read data request.
- the virtual slave device may carry the default read operation return value in the processing response, and return the processing response.
- the read operation return value set by default is different from the read operation return value that can be returned by any slave device connected to the bus of the SoC, so that the master device can determine the processing response after receiving the processing response.
- the return value of the read operation carried in is not returned by the real target slave device, but is returned by the virtual slave device, so that it can be determined that the read operation fails.
- the virtual slave device may be set in the ingress processing unit, so that when the ingress processing unit detects that the processing operation of the data processing request satisfies the timeout condition, it can send the data processing request to the virtual slave device for response as soon as possible.
- the virtual slave device can respond to the first data processing request that satisfies the timeout condition, and return the processing response, so as to end the first data processing request with timeout of the processing operation in time, so as to avoid the first data processing request with the timeout of the processing operation.
- Data processing requests occupy system resources for a long time and cause the system to hang up. For data processing requests whose processing operations do not time out, they can still be processed in a conventional manner, thereby ensuring the reliability and stability of the on-chip system.
- the ingress processing unit receives the processing response from the target slave device of the first data processing request, and discards the processing response from the target slave device.
- the processing response of the device is used to avoid a conflict between the processing response of the virtual slave device and the processing response of the target slave device.
- the ingress processing unit may generate first cache information, where the first cache information includes information of the first data processing request, such as the context of the first data processing request.
- the context of the first data processing request may include: the information and thread ID of the master device that sends the first data processing request, the address corresponding to the first data processing request (such as the memory address of the data in the read data request or the write data request) ), the type of the first data processing request (such as a read data request or a write data request), and the like.
- the ingress processing unit When the ingress processing unit is performing the above exception processing, it can send the first data processing request to the virtual slave device, and delete the first cache information after receiving the processing response from the first data processing request returned by the virtual slave device. , release the cache resources occupied by the first cache information, and can further return the processing response to the sender master device of the first data processing request, so as to complete the processing operation of the first data processing request and avoid the long-term data processing request with the processing operation timeout. Occupies system resources, causing the system to hang up.
- the ingress processing unit may receive a second data processing request before monitoring that the processing operation of the first data processing request satisfies the timeout condition, and the second data processing request and the first data processing request are from the same master device. thread.
- the ingress processing unit After the ingress processing unit receives the second data processing request, it generates second cache information.
- the second cache information includes information of the second data processing request, such as the context of the second data processing request.
- the ingress processing unit detects that the processing operation of the first data processing request satisfies the timeout condition: it can also retain the first cache information and release the cache resources occupied by the second cache information; or, retain the second cache information and release the first cache information Occupied cache resources.
- the source of the second data processing request and the first data processing request is the same (the same thread of the same master device)
- the first data processing request is sent to
- the second data processing request with the same source may also be sent to the virtual slave device.
- the second data processing request from the second data processing request is discarded. The processing response from the target slave of the processing request.
- the second data processing request and the first data processing request come from the same thread of the same master device, when the processing operation of the first data processing request satisfies the timeout condition, the The cache resources occupied by the second cache information corresponding to the second data processing request, so that among multiple data processing requests from the same master device and the same thread, if the processing operation of one of the data processing requests satisfies the timeout condition, only the Cache resources occupied by one of the data processing requests, thereby reducing the cache resources occupied by data processing requests from the same source.
- the ingress processing unit may receive a third data processing request, and the third data processing The request and the first data processing request come from the same thread of the same master device.
- the entry processing unit may not apply for new cache resources to cache the information of the third data processing request (such as context), but when the cache resources occupied by the reserved first cache information are released based on the processing response returned by the virtual slave device, the third cache information is generated to cache the information of the third data processing request (such as the context ).
- the cache resources occupied by the third cache information are the same as the cache resources occupied by the first cache information, that is, instead of applying for new cache resources, the cache resources occupied by the first cache information can be used after the cache resources occupied by the first cache information are released. The released cache resources are used to store the third cache information.
- the cache resources occupied by the reserved first cache information may be released after the virtual slave device returns at least one of a processing response corresponding to the first data processing request and a processing response corresponding to the second data processing request.
- the ingress processing unit does not apply for new cache resources to cache the third data Process the requested information (such as context), but when the cache resources occupied by the reserved second cache information are released based on the processing response returned by the virtual slave device, the third cache information is generated to cache the third data processing The requested information (such as context). Further, the cache resources occupied by the third cache information are the same as the cache resources occupied by the second cache information.
- the cache resource caches the context of the third data processing request, thereby reducing system resources occupied by subsequent data processing requests from a source (ie, the same thread of the same master device) when a data processing request from a source satisfies the timeout condition.
- the processing operations of data processing requests from other sources may not be affected, thereby ensuring the overall reliability and stability of the system-on-chip.
- the ingress processing unit may also report a timeout interrupt event after monitoring that the processing operation of the first data processing request satisfies the timeout condition.
- the timeout interrupt event may carry information of the first data processing request, such as the context of the first data processing request, so as to further determine the cause of the timeout according to the above information included in the timeout interrupt event.
- the ingress processing unit may further save the information of the first data processing request in the system abnormal event log.
- the context of the first data processing request may be recorded in the system abnormal event log, so that the cause of the timeout can be subsequently analyzed according to the log.
- FIG. 3 it is a schematic structural diagram of an ingress processing unit in a system-on-chip provided by an embodiment of the present application.
- the entry processing unit 200 may include a timeout monitoring module 301 , an exception processing module 302 , a virtual slave device 303 , and further, at least one of an interrupt reporting module 304 and an interrupt recording module 305 .
- each functional module in the ingress processing unit is described below by taking the ingress processing unit 200 receiving the first data processing request from the master device as an example. It should be noted that although the first data processing request is described as an example, it should be understood that any data processing request (such as the second data processing request, the third data processing request, etc.) from the master device can be Proceed as follows.
- the timeout monitoring module 301 is used to monitor whether the processing operation of the first data processing request from the master device satisfies the timeout condition.
- the exception processing module 302 is triggered to perform exception processing .
- the interrupt reporting module 304 can also be triggered to report a timeout interrupt event.
- the interrupt recording module 306 can also be triggered to perform log recording of system abnormal events.
- the exception handling module 302 is configured to perform an exception handling operation. Specifically, the exception processing module 302 sends the first data processing request to the virtual slave device 303, so that the virtual slave device 303 returns a processing response to the first data processing request. Further, the processing response can be returned to the first data processing request. The sender master device of the data processing request to complete the processing process of the first data processing request.
- the virtual slave device 303 is configured to respond to the first data processing request for which the processing operation times out. For example, the virtual slave device may generate a processing failure response for the first data processing request whose processing operation times out, and the processing failure response may be sent to the master device that sends the data processing request.
- the interrupt reporting module 304 is configured to report a timeout interrupt event.
- the timeout interrupt event may carry information of the first data processing request that satisfies the timeout condition, such as the context of the first data processing request.
- the interrupt recording module 305 is configured to save the information of the first data processing request into the system abnormal event log.
- the context of the first data processing request may be recorded in the system abnormal event log.
- the structure of the ingress processing unit shown in FIG. 3 is only an example, and the embodiments of the present application do not limit the structure division of the ingress processing unit.
- the virtual slave device may It is included in the exception processing module; in other embodiments, the function of the virtual slave device can be realized by the exception processing module, without the need to separately set the virtual slave device.
- a two-level counter may be used to monitor whether the processing operation of the data processing request times out.
- FIG. 4 it is a schematic diagram of the principle of implementing timeout monitoring and exception handling by an ingress processing unit in an embodiment of the present application.
- the ingress processing unit can cache the information of each data processing request from the master device, so as to wait for the target slave device to return a processing response.
- the entry processing unit after receiving the first data processing request from the master device, the entry processing unit generates a first cache (entry1), where the first cache includes information of the first data processing request.
- the information of the first data processing request may be the context of the first data processing request.
- the entry processing unit After receiving the second data processing request from the master device, the entry processing unit generates a second cache (entry2), the second cache includes the context of the second data processing request, and the entry processing unit receives the third data processing request from the master device After the request, a third cache (entry3) is generated, and the third cache includes the third data processing request context, and so on.
- the cache information corresponding to these data processing requests forms a cache queue 410 .
- the ingress processing unit can transmit the data processing request corresponding to each cache information in the buffer queue 410 to the bus of the system-on-chip in a first-in, first-out order, so that the data processing request can be transmitted to the target slave device via the bus.
- the processing response can be returned to the corresponding master device, and the cached information of the data processing request in the cache queue 410 can be deleted from the queue (that is, the buffer occupied by the corresponding cached information can be released. cache resources) to complete the processing operation of the data processing request.
- the ingress processing unit may also transmit the data processing request corresponding to each cache information in the cache queue 410 to the bus of the system-on-chip in other manners or in other sequences, which is not limited in this embodiment of the present application.
- a public counter 420 and a private counter 421 can be set in the ingress processing unit.
- the common counter 420 is a globally shared counter.
- the number of the private counters 421 is related to the number of cache information in the cache queue 410 , and the cache information (entry) of one data processing request corresponds to one private counter.
- the common counter is counted in a manner of increasing the count value, every time a data processing request from the master device is received, the count value of the common counter is incremented (eg, incremented by 1).
- the common counter is reset (cleared) to restart counting.
- the cache queue 410 is empty, the common counter is reset (cleared).
- the common counter remains reset.
- the common counter is reset (set to the maximum value) to restart counting.
- the common counter is reset.
- the private counter can be created and deleted at the following timings: when a data processing request from the master device is received and corresponding cache information is generated, a private counter corresponding to the cache information can be created; when the cache queue After the cache information corresponding to a data processing request in 410 is deleted (ie, the cache resources occupied by the corresponding cache information are released), the private counter corresponding to the cache information may be deleted.
- the count value of each private counter is triggered to change.
- the count value of each private counter is incremented (eg, incremented by 1).
- the count value of each private counter is decremented (eg, decremented by 1).
- the count value of the private counter When the count value of the private counter reaches the set threshold or overflows, it indicates that the processing operation of the data processing request corresponding to the corresponding cache information times out. Further, the count value of the private counter can remain unchanged at this time. For example, if the private counter counts by incrementing the count value, when the count value of the private counter reaches or exceeds the maximum value, or reaches the specified threshold value, it indicates that the processing operation of the corresponding data processing request times out; If the count value is decremented, when the count value of the private counter overflows (for example, the count value decreases to 0), it indicates that the processing operation of the corresponding data processing request times out.
- the maximum value of the public counter and the maximum value of the private counter may be set according to factors such as performance requirements of the system-on-chip, which are not limited in this embodiment of the present application.
- the exception processing flow can be entered. For example, a data processing request that satisfies the timeout condition can be sent to the virtual slave device for response.
- the data processing request when it is detected that the processing operation of a data processing request satisfies the timeout condition, the data processing request may be marked as timeout. For example, for the first data processing request, when the private counter corresponding to the first cache (entry1) overflows, it indicates that the operation of the first data processing request satisfies the timeout condition, so the first cache corresponding to the first data processing request is Marked as timed out.
- the cache corresponding to the data processing request from the same source for example, from the same thread of the same master device
- the cache queue includes a first cache corresponding to the first data processing request and a second cache corresponding to the second data processing request.
- the first data processing request and the second data processing request are from the same thread of the same master device.
- both the first cache and the second cache may be marked as timeout.
- the number of data processing requests with the same origin in the cache queue can be larger.
- the cache information corresponding to all the data processing requests with the same origin can be marked as timeout.
- the processing operation of the first data processing request satisfies the timeout condition, and all cached information corresponding to the data processing request from the same source as the first data processing request is marked as timeout, only these sources can be kept the same and marked. It is the cache information corresponding to one data processing request in the time-out data processing request. For example, the first cache information corresponding to the first data processing request is reserved, and the cache information corresponding to the other data processing requests in these data processing requests from the same source is occupied by the cache information. Resources are released.
- the third data processing request from the same source is subsequently received (that is, the third data processing request and the first data processing request originate from the same thread of the same master device)
- a new request for the third data processing request can no longer be applied for. resource to cache the context of the data processing request, but after the cache resource occupied by the retained cache information is released, the cache information is used to store the context of the third data processing request.
- the first data processing request corresponding to the second data processing request can be released.
- the first cache information corresponding to the first data processing request is reserved, and the first cache information can be marked as locked.
- the third data processing request applies for a new cache resource, but after the cache resource occupied by the first cache information is released, the cache resource is used to store the context of the third data processing request.
- FIG. 5 it is a schematic flowchart of a system-on-chip exception handling method provided by an embodiment of the present application. As shown in the figure, the process may include:
- S501 Receive a first data processing request from a master device.
- the first data processing request may be a request for reading data or a request for writing data.
- S502 Send a first data processing request to the target slave device.
- the ingress processing unit in the system-on-chip when the ingress processing unit in the system-on-chip receives the first data processing request, it can perform protocol conversion, and convert the protocol-converted first data processing request. Sent to the bus for transfer by the bus to the target slave device.
- S503 Monitor whether the processing operation of the first data processing request satisfies the timeout condition, and when it is monitored that the processing operation of the first data processing request satisfies the timeout condition, go to S504, otherwise, go to S505.
- the reason for the timeout of the first data processing request may be: when the bus routes the first data processing request, it is possible that the first data processing request cannot be sent to the target slave device due to the failure of the interconnection path, and thus the target slave device cannot be received.
- Another reason that causes the first data processing request to time out may be: the failure of the target slave device makes it impossible to return a processing response, so that the system-on-chip cannot receive the processing response returned by the target slave device, thus causing the processing operation to time out.
- step 504 may be entered to enter the abnormal processing flow.
- the exception processing flow may include the following steps: sending the first data processing request to the virtual slave device to trigger the virtual slave device to return a processing response to the first data processing request.
- the virtual slave device may return a processing response, such as a processing failure response. Further, the processing response may be sent to the sender master device of the first data processing request.
- a timeout interrupt event may also be reported.
- the timeout interrupt event carries the information of the first data processing request, such as the context of the first data processing request.
- the information of the first data processing request may also be saved in the system abnormal event log.
- S505 adopt the conventional processing operation. For example, waiting for the target slave device to return a processing response.
- the processing response from the target slave device is received before the processing response to the first data processing request returned by the virtual slave device is received, the processing response from the target slave device is discarded.
- the method further includes: generating first cache information, where the first cache information includes information of the first data processing request, such as the context of the first data processing request; Before the processing operation of the first data processing request satisfies the timeout condition, the method further includes: receiving a second data processing request, and the second data processing request and the first data processing request are from the same thread of the same master device; generating second cache information, The second cache information includes the information of the second data processing request.
- the method further includes: retaining the first cache information and releasing the cache resources occupied by the second cache information, or retaining the second cache information and releasing the first cache information occupied cache resource.
- the second data processing request with the same source as the first data processing request is sent to the virtual slave device.
- the method further includes: receiving a third data processing request, where the third data processing request and the first data processing request come from the same master device. Thread; when the cache resources occupied by the first cache information are released based on the processing response returned by the virtual slave device, the third cache information is generated, and the third cache information includes the information of the third data processing request, and the cache occupied by the third cache information The resource is the same as the cache resource occupied by the first cache information.
- the method further includes: receiving a third data processing request, where the third data processing request and the first data processing request come from the same master device. Thread; when the cache resources occupied by the second cache information are released based on the processing response returned by the virtual slave device, the third cache information is generated.
- the third cache information includes the information of the third data processing request, and the cache occupied by the third cache information The resource is the same as the cache resource occupied by the second cache information.
- FIG. 6 exemplarily shows a schematic flowchart of a method for handling exceptions in a system-on-chip in a specific application scenario.
- the process may include:
- S601 Receive a data processing request from a master device.
- S602 Cache the context of the data processing request, and obtain cache information corresponding to the data processing request.
- S603 The count value of the public counter is incremented by 1, and the count value of the private counter corresponding to the data processing request is incremented by 1 when the count value of the public counter overflows.
- S604 Whether the count value of the private counter corresponding to the data processing request reaches the threshold value, or whether it overflows, if so, it indicates that the processing operation of the data processing request has timed out, then go to S605; otherwise, go to S612.
- S605 Report a timeout interrupt event, and record the context of the data processing request in the abnormal event log.
- S606 Mark the cache information of the data processing request in the cache queue and the cache information originating from the same master device and the same thread as the data processing request as timeout.
- S607 Send the data processing request corresponding to the cached information marked as timeout to the virtual slave device for response.
- the multiple data processing requests are sent to the virtual slave device for response, and are reserved and marked as If the locked cache information is the first cache information, in S611, after receiving the respective processing responses returned by the virtual slave device for the multiple data processing requests, the cache resources occupied by the first cache information are released.
- an embodiment of the present application further provides a communication device, which may have a structure as shown in FIG. 7 , and the communication device may implement the system-on-chip of the above method, or may be capable of implementing the above method chip or system of chips.
- the communication apparatus 700 shown in FIG. 7 may include at least one processor 702, and the at least one processor 702 is configured to be coupled with a memory, and read and execute instructions in the memory to implement the method provided by the embodiments of the present application.
- the communication apparatus 700 may further include at least one interface 703 for providing program instructions or data for the at least one processor.
- the communication device 700 may perform the steps in the method as shown in FIG. 5 or FIG. 6 .
- interface 703 may be used to support communication device 700 for communication.
- the communication device 700 may further include a memory 704 in which computer programs and instructions are stored, and the memory 704 may be coupled with the processor 702 and/or the interface 703 for supporting the processor 702 to call the computer programs and instructions in the memory 704.
- the memory 704 may also be used to store data involved in the method embodiments of the present application, for example, to store the data, instructions, and /or for storing configuration information necessary for the communication device 700 to execute the method described in the embodiments of the present application.
- the embodiments of the present application further provide a computer-readable storage medium, on which some instructions are stored.
- the computer can complete the above method embodiments and method implementations.
- the computer-readable storage medium is not limited, for example, it may be RAM (random-access memory, random access memory), ROM (read-only memory, read-only memory), etc.
- the present application further provides a computer program product, which, when invoked and executed by a computer, can complete the method embodiments and the methods involved in any possible designs of the above method embodiments.
- the present application further provides a chip, which may include a processor and an interface circuit, and is used to implement the above method embodiments and any possible implementation manners of the method embodiments.
- a chip which may include a processor and an interface circuit, and is used to implement the above method embodiments and any possible implementation manners of the method embodiments.
- method where "coupled” means that two components are directly or indirectly bonded to each other, which may be fixed or movable, and which may allow flow of fluids, electricity, electrical signals, or other types of signals between two components. communication between the components.
- the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
- software it can be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
- a general-purpose processor may be a microprocessor, or alternatively, the general-purpose processor may be any conventional processor, controller, microcontroller, or state machine.
- a processor may also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors in combination with a digital signal processor core, or any other similar configuration. accomplish.
- a software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
- a storage medium may be coupled to the processor such that the processor may read information from, and store information in, the storage medium.
- the storage medium can also be integrated into the processor.
- the processor and storage medium may be provided in the ASIC, and the ASIC may be provided in the terminal device. Alternatively, the processor and the storage medium may also be provided in different components in the terminal device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Hardware Design (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
一种片上系统异常处理方法、片上系统及其装置。本申请中,接收来自于主设备的第一数据处理请求;向目标从设备发送所述第一数据处理请求,并监测所述第一数据处理请求的处理操作是否满足超时条件;当监测到所述第一数据处理请求的处理操作满足超时条件时,将所述第一数据处理请求发送给虚拟从设备,以触发所述虚拟从设备返回所述第一数据处理请求的处理响应。
Description
本申请涉及芯片技术领域,尤其涉及一种片上系统异常处理方法、片上系统及其装置。
随着半导体技术的持续发展,片上系统(system on chip,SoC)内集成了越来越多的处理单元,这些处理单元采用NoC(network on chip片上网络)架构互联。然而,随着SoC集成度的提高,由于处理单元和互联通路的故障率增加,导致NoC整体稳定性和可靠性下降,性能降低。
在NoC互联架构中,如果发生异常,可能导致主设备(master)的请求无法被正常处理完成,该请求一直处于等待状态,使得系统挂死,该主设备(master)的请求或者其他主设备的请求均无法正常工作,影响NoC的整体稳定性和可靠性。
发明内容
本申请实施例提供一种片上系统异常处理方法、片上系统及其装置,用以解决数据处理请求超时导致的片上系统异常,以提高片上系统稳定性和可靠性。
第一方面,提供一种片上系统异常处理方法,包括:
接收来自于主设备的第一数据处理请求;
向目标从设备发送所述第一数据处理请求,并监测所述第一数据处理请求的处理操作是否满足超时条件;
当监测到所述第一数据处理请求的处理操作满足超时条件时,将所述第一数据处理请求发送给虚拟从设备,以触发所述虚拟从设备返回所述第一数据处理请求的处理响应。
上述实现方式中,第一数据处理请求的操作超时时,将其发送给虚拟从设备进行处理,由虚拟从设备返回处理响应,从而及时结束处理操作超时的第一数据处理请求,以避免处理操作超时的第一数据处理请求长时间占用系统资源而导致系统挂死,从而保证片上系统的稳定性和可靠性。
在一种可能的实现方式中,还包括:若在接收到所述虚拟从设备返回的所述第一数据处理请求的处理响应之前,接收到来自于所述目标从设备的处理响应,则丢弃所述来自于所述目标从设备的处理响应。
上述实现方式中,将第一数据处理请求发送给虚拟从设备处理后,在接收到虚拟从设备返回的处理响应之前,若先接收到了目标从设备返回的处理响应,则将该目标从设备返回的处理响应丢弃,以避免与虚拟从设备的处理发生冲突。
在一种可能的实现方式中,接收来自于主设备的第一数据处理请求之后,还包括:
生成第一缓存信息,所述第一缓存信息包括所述第一数据处理请求的信息;
在监测到所述第一数据处理请求的处理操作满足超时条件之前,还包括:
接收第二数据处理请求,所述第二数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;
生成第二缓存信息,所述第二缓存信息包括所述第二数据处理请求的信息;
当监测到所述第一数据处理请求的处理操作满足超时条件时,还包括:
保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源;或者,保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源。
上述实现方式中,如果接收到的第一数据处理请求与第二数据处理请求来自于同一主设备的同一线程,则当监测到第一数据处理请求满足超时条件时,释放第一数据处理请求或第二数据处理请求对应的缓存信息占用的缓存资源,使得来自于同一主设备以及同一线程的数据处理请求中,若其中有一个数据处理请求的处理操作满足超时条件,则仅保留其中一个数据处理请求占用的缓存资源,从而当来自于同一主设备的同一线程的数据处理请求超时时,减少同一来源的数据处理请求所占用的缓存资源。
进一步的,还包括:将所述第二数据处理请求发送给所述虚拟从设备。
进一步的,若在接收到所述虚拟从设备返回的所述第二数据处理请求的处理响应之前,接收到来自于所述第二数据处理请求的目标从设备的处理响应,则丢弃所述来自于所述第二数据处理请求的目标从设备的处理响应。
上述实现方式中,具有相同来源(比如同一主设备的同一线程)的多个数据处理请求中,有一个数据处理请求的处理操作超时时,这些具有相同来源的多个数据处理请求均被发送给虚拟从设备,考虑到这些具有相同来源的数据处理请求可能对应相同的目标从设备,如果因该目标从设备故障而导致数据处理请求超时,则通过上述方式,可以及时结束这些数据处理请求的处理操作,释放这些数据处理操作占用的系统资源,避免无谓的等待该目标从设备返回处理响应,进而可以保证片上系统的稳定性和可靠性。
在一种可能的实现方式中,在保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源之后,还包括:
接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;
当基于所述虚拟从设备返回的处理响应,所述第一缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第一缓存信息占用的缓存资源相同。
在一种可能的实现方式中,在保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源之后,还包括:
接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;
当基于所述虚拟从设备返回的处理响应,所述第二缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第二缓存信息占用的缓存资源相同。
上述实现方式中,当具有相同来源(比如来自于同一主设备的同一线程)的多个数据处理请求中有一个数据处理请求满足超时条件,并仅保留其中一个数据处理请求占用的缓存资源后,若再次接收到相同来源的第三数据处理请求,则不再为新接收到的第三数据处理请求分配新的缓存资源,而是当被保留的数据处理请求占用的缓存资源被释放后,使用该缓存资源缓存第三数据处理请求的信息,从而可以在该来源的数据处理请求超时且尚未处理完成时(即相应的缓存信息占用的缓存资源还未被释放),对于该来源的新的数据处理请求,不再分配新的缓存资源,以减少片上系统的缓存资源的开销,进而可以保证片上 系统的可靠性和稳定性。
在一种可能的实现方式中,接收来自于主设备的第一数据处理请求之后,还包括:
将公共计数器的计数值增加或减少,其中,每当接收到一个数据处理请求,所述公共计数器的计数值被增加或减少,所述公共计数器的计数值溢出时,所述公共计数器复位;
设置所述第一数据处理请求对应的私有计数器,当所述第一计数器的计数值溢出时,所述第一数据处理请求对应的私有计数器的计数值被增加或减少,当所述第一数据处理请求对应的私有计数器的计数值溢出时,所述第一数据处理请求的处理操作满足超时条件。
进一步的,还包括:当接收到的所有数据处理请求均被处理完成后,将所述公共计数器复位。在接收到的数据处理请求全部处理完成,没有缓存信息的情况下,将公共计数器复位,以便当之后再次接收到数据处理请求之后,能够对接收到的数据处理请求正确计数。
在一种可能的实现方式中,还包括:当监测到所述第一数据处理请求的处理操作满足超时条件时,执行以下至少一项处理操作:上报超时中断事件,所述超时中断事件中携带所述第一数据处理请求的信息;将所述第一数据处理请求的信息保存到系统异常事件日志中。
上述实现方式中,上报的超时中断事件中可包含发生超时的数据处理请求的相关信息(比如上下文),从而可以指示出发生异常的数据处理请求,进而可以有针对性地进行处理。
第二方面,提供一种片上系统,包括:入口处理单元、总线以及出口处理单元,所述入口处理单元中包括虚拟从设备;
所述入口处理单元,用于接收来自于主设备的第一数据处理请求;监测所述第一数据处理请求的处理操作是否满足超时条件;当监测到所述第一数据处理请求的处理操作满足超时条件时,将所述第一数据处理请求发送给所述虚拟从设备,以触发所述虚拟从设备返回所述第一数据处理请求的处理响应;
所述总线,用于将所述入口处理单元接收到的第一数据处理请求路由到所述出口处理单元;
所述出口处理单元,用于向目标从设备发送所述第一数据处理请求。
在一种可能的实现方式中,所述入口处理单元还用于:若在接收到所述虚拟从设备返回的所述第一数据处理请求的处理响应之前,接收到来自于所述目标从设备的处理响应,则丢弃所述来自于所述目标从设备的处理响应。
在一种可能的实现方式中,所述入口处理单元还用于:
接收来自于主设备的第一数据处理请求之后,生成第一缓存信息,所述第一缓存信息包括所述第一数据处理请求的信息;
所述入口处理单元还用于:
在监测到所述第一数据处理请求的处理操作满足超时条件之前,接收第二数据处理请求,所述第二数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;
生成第二缓存信息,所述第二缓存信息包括所述第二数据处理请求的信息;
所述入口处理单元还用于:
当监测到所述第一数据处理请求的处理操作满足超时条件时,还包括:
保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源;或者,保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源。
进一步的,所述入口处理单元还用于:将所述第二数据处理请求发送给所述虚拟从设备。
在一种可能的实现方式中,所述入口处理单元,还用于:
在保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源之后,接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;
当基于所述虚拟从设备返回的处理响应,所述第一缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第一缓存信息占用的缓存资源相同。
在一种可能的实现方式中,所述入口处理单元,还用于:
在保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源之后,接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;
当基于所述虚拟从设备返回的处理响应,所述第二缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第二缓存信息占用的缓存资源相同。
在一种可能的实现方式中,所述入口处理单元还用于:
接收来自于主设备的第一数据处理请求之后,将公共计数器的计数值增加或减少,其中,每当接收到一个数据处理请求,所述公共计数器的计数值被增加或减少,所述公共计数器的计数值溢出时,所述公共计数器复位;
设置所述第一数据处理请求对应的私有计数器,当所述第一计数器的计数值溢出时,所述第一数据处理请求对应的私有计数器的计数值被增加或减少,当所述第一数据处理请求对应的私有计数器的计数值溢出时,所述第一数据处理请求的处理操作满足超时条件。
进一步的,所述入口处理单元还用于:当接收到的所有数据处理请求均被处理完成后,将所述公共计数器复位。
在一种可能的实现方式中,所述入口处理单元还用于:
当监测到所述第一数据处理请求的处理操作满足超时条件时,执行以下至少一项处理操作:
上报超时中断事件,所述超时中断事件中携带所述第一数据处理请求的信息;
将所述第一数据处理请求的信息保存到系统异常事件日志中。
第三方面,提供一种芯片,所述芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以实现如第一方面中任一项所述的方法。
第四方面,提供一种通信装置,包括至少一个处理器,所述至少一个处理器与存储器相连,所述至少一个处理器用于读取并执行所述存储器中存储的程序,以使得所述通信装置执行如第一方面中任一项所述的方法。
第五方面,提供一种计算机存储介质,所述计算机可读存储介质存储有计算机指令,当所述指令在计算机上运行时,使得计算机执行如第一方面中任一所述的方法。
第六方面,提供一种计算机产品,所述计算机程序产品在被计算机调用时,使得计算机执行如第一方面中任一所述的方法。
图1为本申请实施例中的片上系统架构示意图;
图2a、图2b分别为本申请实施例中片上系统中的入口处理单元、总线以及出口处理单元的连接示意图;
图3为本申请实施例中片上系统中入口处理单元的结构示意图;
图4为本申请实施例中入口处理单元实现超时监测以及异常处理的原理示意图;
图5为本申请实施例提供的片上系统异常处理方法的流程示意图;
图6为本申请另一实施例提供的片上系统异常处理的流程示意图;
图7为本申请实施例提供的通信装置的结构示意图。
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。
以下,对本申请实施例中的部分用语进行解释说明,以便于本领域技术人员理解。
本申请实施例中的术语“系统”和“网络”可被互换使用。“多个”是指两个或两个以上,鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“至少一个”,可理解为一个或多个,例如理解为一个、两个或更多个。例如,包括至少一个,是指包括一个、两个或更多个,而且不限制包括的是哪几个,例如,包括A、B和C中的至少一个,那么包括的可以是A、B、C、A和B、A和C、B和C、或A和B和C。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。
除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。
方法实施例中的具体操作方法也可以应用于装置实施例或系统实施例中。
参见图1,为本申请实施例中的片上系统的架构示意图。
如图所示,片上系统100与一个或多个主设备(也称master)连接,如图中所示的主设备1、主设备2,直到主设备N;片上系统100与一个或多个从设备(也称slave)连接,如图中所示的从设备1、从设备2,直到从设备M。
主设备可包括处理器,比如CPU、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processing,ISP)等。从设备可包括存储器、存储控制器等,比如内存控制器。主设备发起数据处理请求,比如读数据请求或者写数据请求或者其他类型的数据处理请求,该数据处理请求经过片上系统传输给目标从设备进行处理。
以主设备是处理器,从设备是内存控制器为例,处理器发送读数据请求,片上系统将该读数据请求(包括地址)传输到内存控制器,然后等待内存控制器对该读数据请求进行回应。过了一段时间,内存控制器根据该地址将从内存颗粒读出的数据发送给片上系统,片上系统将该数据传输给处理器。如果处理器接收到该数据后,对该数据验证无误(比如ECC或者奇偶校验不出错),则该读数操作完成。如果处理器发送写数据请求,片上系统将该写数据请求(包括地址)和数据传输到内存控制器,内存控制器根据该地址向内存颗 粒写入该数据,并在执行完成上述写操作后回复确认信息,该确认信息经片上系统返回到处理器,写操作完成。
片上系统100中可包括入口处理单元200、总线300以及出口处理单元400。其中,入口处理单元也可称为入口桥(initial bridge,IB),出口处理单元也可称为出口桥(target bridge,TB)。入口处理单元200、总线300以及出口处理单元400可采用现场可编程门阵列(field programmable gate array,FPGA)或专用集成电路(application specific integrated circuit,ASIC)实现。
入口处理单元200用于接收来自于主设备的数据处理请求,进一步的可执行协议转换,从而将接收到的数据处理请求转换为符合片上系统内部处理要求的格式,进一步的还可以执行安全策略(即鉴权)等处理。进一步的,在本申请实施例中,入口处理单元200可实现超时监测、中断上报、异常记录、异常处理等一项或多项功能,具体实现方式可参见后面的描述。
出口处理单元400用于向从设备发送数据处理请求,进一步的可对需要发送给从设备的数据处理请求执行协议转换,从而将数据处理请求转换为符合从设备要求的格式。
总线300用于将来自于主设备的数据处理请求路由到目标从设备。总线300也可称为交换网络。
参见图2a和图2b,分别为本申请实施例中片上系统中的入口处理单元200、总线300以及出口处理单元400的连接示意图。如图所示,总线300中可包括多个路由单元(router),形成多个路由单元的互联结构。图中仅示例性示出了若干个路由单元,在实际场景中,总线300中可包含更少或更多的路由单元,本申请实施例对此不做限制。
需要说明的是,在一些实施例中,也可将主设备和从设备作为片上系统的组成部分,本申请实施例对此不做限制。
本申请的实施例提供的片上系统中,入口处理单元执行的超时监测处理操作可包括:
接收到来自于主设备的数据处理请求后,监测该数据处理请求的处理操作是否满足超时条件。其中,该数据处理请求可以是来自主设备的任一数据处理请求,比如本申请实施例中涉及的第一数据处理请求、第二数据处理请求或第三数据处理请求。
可选的,入口处理单元可采用两级计数器来实现超时监测,具体实现过程可参见图4及其相关描述。当然,本申请实施例也可采用其他方式对数据处理请求的处理操作是否超时进行监测,比如采用计时器对数据处理请求的处理操作进行计时,当计时时间达到设定门限值时,可确定对应的数据处理请求的处理操作满足超时条件。
本申请的实施例提供的片上系统中,以第一数据处理请求为例,入口处理单元执行的异常处理操作可包括:
当监测到第一数据处理请求的处理操作满足超时条件时,将第一数据处理请求发送给虚拟从设备,以触发虚拟从设备返回第一数据处理请求的处理响应。进一步的,可将虚拟从设备返回的第一数据处理请求的处理响应发送给第一数据处理请求的发送方主设备,以完成第一数据处理请求的处理操作。
可选的,当监测到第一数据处理请求的处理操作满足超时条件时,可对第一数据处理请求进行超时标记,以指示该第一数据处理请求为处理操作超时的数据处理请求,该被标记的第一数据处理请求被发送给虚拟从设备进行响应。
可选的,虚拟从设备可针对处理操作超时的第一数据处理请求,生成处理失败响应, 该处理失败响应可被发送给该数据处理请求的发送方主设备。在另一些实施例中,该虚拟从设备可进行处理,并生成处理响应,该处理响应可被发送给该数据处理请求的发送方主设备。
举例来说,如果第一数据处理请求为写数据请求,则虚拟从设备接收到该写数据请求后,可返回写数据失败响应。再举例来说,如果第一数据处理请求为读数据请求,则虚拟从设备接收到该读数据请求后,可返回读数据失败响应。再举例来说,如果第一数据处理请求为读数据请求,则虚拟从设备在接收到该读数据请求后,可将默认设置的读操作返回值携带在处理响应中,返回该处理响应。其中,所述默认设置的读操作返回值,不同于该片上系统的总线所连接的任何从设备所能够返回的读操作返回值,使得主设备在接收到该处理响应后,可确定该处理响应中携带的读操作返回值不是真实存在的目标从设备返回的,而是虚拟从设备返回的,从而可以确定读操作失败。
可选的,虚拟从设备可设置在入口处理单元中,以便当入口处理单元监测到数据处理请求的处理操作满足超时条件时,能够尽快将该数据处理请求发送到虚拟从设备进行响应。
通过上述异常处理操作,可由虚拟从设备对满足超时条件的第一数据处理请求进行响应,返回处理响应,从而及时结束该处理操作超时的第一数据处理请求,以避免该处理操作超时的第一数据处理请求长时间占用系统资源而导致系统挂死,对于处理操作未超时的数据处理请求,可仍按照常规方式进行处理,进而可以保证片上系统的可靠性和稳定性。
可选的,若在接收到虚拟从设备返回的第一数据处理请求的处理响应之前,入口处理单元接收到来自于第一数据处理请求的目标从设备的处理响应,则丢弃该来自于目标从设备的处理响应,以避免虚拟从设备的处理响应于该目标从设备的处理响应发生冲突。
可选的,入口处理单元在接收来自于主设备的第一数据处理请求之后,可生成第一缓存信息,第一缓存信息包括第一数据处理请求的信息,比如第一数据处理请求的上下文。其中,第一数据处理请求的上下文可包括;发送第一数据处理请求的主设备的信息以及线程ID,该第一数据处理请求对应的地址(比如读数据请求或写数据请求中数据的内存地址),该第一数据处理请求的类型(比如是读数据请求还是写数据请求)等。当入口处理单元在进行上述异常处理时,可将第一数据处理请求发送给虚拟从设备,并接收到来自于虚拟从设备返回的第一数据处理请求的处理响应后,将第一缓存信息删除,释放第一缓存信息占用的缓存资源,并可进一步向第一数据处理请求的发送方主设备返回该处理响应,从而完成第一数据处理请求的处理操作,避免处理操作超时的数据处理请求长期占用系统资源,导致系统挂死。
可选的,入口处理单元在监测到第一数据处理请求的处理操作满足超时条件之前,可能接收到第二数据处理请求,第二数据处理请求与第一数据处理请求来自于同一主设备的同一线程。当入口处理单元接收到第二数据处理请求后,生成第二缓存信息。其中,第二缓存信息包括第二数据处理请求的信息,比如第二数据处理请求的上下文。入口处理单元在监测到第一数据处理请求的处理操作满足超时条件时:还可保留第一缓存信息,释放第二缓存信息占用的缓存资源;或者,保留第二缓存信息,释放第一缓存信息占用的缓存资源。
进一步的,由于第二数据处理请求与第一数据处理请求的来源相同(同一主设备的同一线程),当第一数据处理请求的处理操作满足超时条件时,在将第一数据处理请求发送给虚拟从设备的基础上,还可将与其来源相同的第二数据处理请求发送给虚拟从设备。进 一步的,若在接收到虚拟从设备返回的第二数据处理请求的处理响应之前,接收到来自于第二数据处理请求的目标从设备的处理响应,则丢弃所述来自于所述第二数据处理请求的目标从设备的处理响应。
由于第二数据处理请求与第一数据处理请求来自于同一主设备的同一线程,当第一数据处理请求的处理操作满足超时条件时,在对第一数据处理请求进行异常处理的基础上,释放第二数据处理请求对应的第二缓存信息占用的缓存资源,使得来自于同一主设备以及同一线程的多个数据处理请求中,若其中有一个数据处理请求的处理操作满足超时条件,则仅保留其中一个数据处理请求占用的缓存资源,从而减少同一来源的数据处理请求所占用的缓存资源。
进一步的,入口处理单元在保留第一数据处理请求对应的第一缓存信息,释放第二数据处理对应的第二缓存信息占用的缓存资源之后,可能接收到第三数据处理请求,第三数据处理请求与第一数据处理请求来自于同一主设备的同一线程。入口处理单元接收到第三数据处理请求后,由于此时来自于同一主设备的同一线程的第一数据处理请求已经超时,因此可不申请新的缓存资源来缓存第三数据处理请求的信息(如上下文),而是当基于虚拟从设备返回的处理响应,被保留的第一缓存信息占用的缓存资源被释放后,再生成第三缓存信息,用以缓存第三数据处理请求的信息(如上下文)。进一步的,第三缓存信息所占用的缓存资源与第一缓存信息占用的缓存资源相同,即,可不用申请新的缓存资源,而是当第一缓存信息占用的缓存资源被释放后,使用该被释放的缓存资源来存储第三缓存信息。
其中,被保留的第一缓存信息所占用的缓存资源,可在虚拟从设备返回第一数据处理请求对应的处理响应以及第二数据处理请求对应的处理响应中的至少一个处理响应之后被释放。
基于相同原理,入口处理单元在保留第二数据处理请求对应的第二缓存信息,释放第一数据处理请求对应的第一缓存信息占用的缓存资源之后,不申请新的缓存资源来缓存第三数据处理请求的信息(如上下文),而是当基于虚拟从设备返回的处理响应,被保留的第二缓存信息占用的缓存资源被释放后,再生成第三缓存信息,用以缓存第三数据处理请求的信息(如上下文)。进一步的,第三缓存信息所占用的缓存资源与第二缓存信息占用的缓存资源相同。
这样,当来自于同一主设备的同一线程的多个数据处理请求中有一个数据处理请求满足超时条件,并仅保留其中一个数据处理请求对应的缓存信息占用的缓存资源后,若再次接收到来自于同一主设备且同一线程的第三数据处理请求,则不再为新接收到的第三数据处理请求分配新的缓存资源,而是在被保留的缓存信息占用的缓存资源被释放后,使用该缓存资源缓存第三数据处理请求的上下文,从而可以在某个来源的数据处理请求满足超时条件时,减少该来源(即同一主设备的同一线程)后续的数据处理请求所占用的系统资源。对于其他来源的数据处理请求的处理操作,可以不受影响,从而保证片上系统整体的可靠性和稳定性。
可选的,入口处理单元在监测到第一数据处理请求的处理操作满足超时条件后,还可上报超时中断事件。可选的,该超时中断事件中可携带第一数据处理请求的信息,比如可携带第一数据处理请求的上下文,以便进一步根据超时中断事件中包含的上述信息确定导致超时的原因。
可选的,入口处理单元在监测到第一数据处理请求的处理操作满足超时条件后,还可将第一数据处理请求的信息保存到系统异常事件日志中。可选的,可将第一数据处理请求的上下文记录到系统异常事件日志中,以便后续根据该日志分析超时原因。
参见图3,为本申请实施例提供的片上系统中入口处理单元的结构示意图。
如图所示,入口处理单元200可包括超时监测模块301、异常处理模块302、虚拟从设备303,进一步的,还可包括中断上报模块304和中断记录模块305中的至少一项。
下面以入口处理单元200接收到来自于主设备的第一数据处理请求为例,描述入口处理单元中各功能模块的功能。需要说明的是,虽然是以第一数据处理请求为例描述,但应理解,对于来自于主设备的任意一个数据处理请求(比如第二数据处理请求、第三数据处理请求等),均可按照以下方式进行处理。
超时监测模块301,用于监测来自于主设备的第一数据处理请求的处理操作是否满足超时条件,当监测到第一数据处理请求的处理操作满足超时条件时,触发异常处理模块302进行异常处理。进一步的,还可触发中断上报模块304上报超时中断事件。进一步的,还可触发中断记录模块306进行系统异常事件日志记录。
异常处理模块302,用于执行异常处理操作。具体的,异常处理模块302将第一数据处理请求发送给虚拟从设备303,以使虚拟从设备303返回该第一数据处理请求的处理响应,进一步的,可将该处理响应返回给该第一数据处理请求的发送方主设备,以完成该第一数据处理请求的处理过程。
虚拟从设备303,用于响应处理操作超时的第一数据处理请求。比如,该虚拟从设备可针对处理操作超时的第一数据处理请求,生成处理失败响应,该处理失败响应可被发送给该数据处理请求的发送方主设备。
中断上报模块304,用于上报超时中断事件,可选的,该超时中断事件中可携带满足超时条件的第一数据处理请求的信息,比如可携带第一数据处理请求的上下文。
中断记录模块305,用于将第一数据处理请求的信息保存到系统异常事件日志中。可选的,可将第一数据处理请求的上下文记录到系统异常事件日志中。
上述各模块所实现的功能的具体实现方式,可参见前述实施例的相关内容。
需要说明的是,图3所示的入口处理单元的结构仅为一种示例,本申请实施例对入口处理单元的结构划分方式不做限制,比如,在其他一些实施例中,虚拟从设备可包含在异常处理模块中;在另一些实施例中,可由异常处理模块实现虚拟从设备的功能,而无需再单独设置虚拟从设备。
可选的,在一些实施例中,可通过两级计数器来实现对数据处理请求的处理操作是否超时进行监测。
参见图4,为本申请实施例中入口处理单元实现超时监测以及异常处理的原理示意图。
如图所示,入口处理单元可将来自于主设备的每个数据处理请求的信息进行缓存,以等目标从设备返回处理响应。
具体的,入口处理单元接收到来自于主设备的第一数据处理请求后,生成第一缓存(entry1),该第一缓存中包括第一数据处理请求的信息。可选的,第一数据处理请求的信息可以是第一数据处理请求的上下文。入口处理单元接收到来自于主设备的第二数据处理请求后,生成第二缓存(entry2),第二缓存中包括第二数据处理请求的上下文,入口处理单元接收到来自主设备的第三数据处理请求后,生成第三缓存(entry3),第三缓存中包括 第三数据处理请求上下文,以此类推。这些数据处理请求对应的缓存信息形成缓存队列410。
入口处理单元可按照先入先出的顺序,将缓存队列410中各缓存信息对应的数据处理请求传输给片上系统的总线,使得数据处理请求可经总线传输给目标从设备。当接收到目标从设备返回的处理响应后,可将该处理响应返回给相应的主设备,并可将缓存队列410中该数据处理请求的缓存信息从队列中删除(即释放相应缓存信息占用的缓存资源),完成该数据处理请求的处理操作。当然,入口处理单元也可按照其他方式或其他顺序,将缓存队列410中各缓存信息对应的数据处理请求传输给片上系统的总线,本申请实施例对此不做限制。
入口处理单元中可设置公共计数器420和私有计数器421。其中,公共计数器420为全局共享计数器。私有计数器421的数量与缓存队列410中的缓存信息的数量相关,一个数据处理请求的缓存信息(entry)对应一个私有计数器。
可选的,若公共计数器采用计数值递增的方式计数,则每当接收到来自于主设备的一个数据处理请求时,将公共计数器的计数值递增(如加1)。当公共计数器的计数值发生溢出,即在已达到最大值的情况下还需要加1时,该公共计数器被复位(清零),以便重新开始计数。当缓存队列410为空时,将公共计数器复位(清零)。在缓存队列410为空的期间内,公共计数器保持复位状态。
可选的,若公共计数器采用计数值递减的方式计数,则每当接收到来自于主设备的一个数据处理请求,将公共计数器的计数值递减(如减1),当公共计数器的计数值发生溢出,即在计数值为零的情况下还需减1时,该公共计数器被复位(设置为最大值),以便重新开始计数。当缓存队列410为空时,将公共计数器复位。
可选的,私有计数器可分别在以下时机被创建以及被删除:当接收到来自主设备的一个数据处理请求,并生成对应的缓存信息后,可创建与该缓存信息对应的私有计数器;当缓存队列410中的一个数据处理请求对应的缓存信息被删除(即相应缓存信息占用的缓存资源被释放)后,可删除该缓存信息对应的私有计数器。
当公共计数器的计数值发生溢出时,触发各私有计数器的计数值发生变化。可选的,如果私有计数器采用计数值递增的方式计数,则当公共计数器的计数值发生溢出时,各私有计数器的计数值递增(如加1)。可选的,如果私有计数器采用计数值递减的方式计数,则当公共计数器的计数值发生溢出时,各私有计数器的计数值递减(如减1)。
当私有计数器的计数值达到设定门限值或发生溢出时,表明相应缓存信息所对应的数据处理请求的处理操作超时,进一步的,此时该私有计数器的计数值可保持不变。比如,如果私有计数器采用计数值递增的方式计数,则当私有计数器的计数值达到或超过最大值,或者达到指定的门限值时,表明相应数据处理请求的处理操作超时;如果私有计数器采用计数值递减的方式计数,则当私有计数器的计数值溢出时(如计数值减小到0)时,表明相应数据处理请求的处理操作超时。
其中,公共计数器的最大值以及私有计数器的最大值,可根据片上系统的性能要求等因素设置,本申请实施例对此不做限制。
当判断一个数据处理请求的处理操作满足超时条件时,可进入异常处理流程。比如,可将满足超时条件的数据处理请求发送给虚拟从设备进行响应。
可选的,当监测到一个数据处理请求的处理操作满足超时条件时,可将该数据处理请求标记为超时。比如,对于第一数据处理请求,当其第一缓存(entry1)对应的私有计数 器发生溢出时,则表明第一数据处理请求的操作满足超时条件,因此将第一数据处理请求对应的第一缓存标记为超时。
可选的,当某个数据处理请求的处理操作超时时,可将缓存队列中与该超时的数据处理请求具有相同来源(比如来源于同一主设备的同一线程)的数据处理请求所对应的缓存信息都标记为超时。比如,缓存队列中包括第一数据处理请求对应的第一缓存以及第二数据处理请求对应的第二缓存,第一数据处理请求和第二数据处理请求来自于同一主设备的同一线程,当监测到第一数据处理请求的处理操作满足超时条件时,可将第一缓存和第二缓存都标记为超时。以上是以两个数据处理请求来源于同一主设备的同一线程为例描述的,应理解,缓存队列中具有相同来源的数据处理请求的数量可以更多,此种情况下,当其中一个数据处理请求的处理操作满足超时条件时,可将这些所有具有相同来源的数据处理请求对应的缓存信息均标记为超时。
可选的,当第一数据处理请求的处理操作满足超时条件,并将所有与第一数据处理请求相同来源的数据处理请求对应的缓存信息标记为超时后,可仅保留这些来源相同且被标记为超时的数据处理请求中一个数据处理请求对应的缓存信息,比如保留第一数据处理请求对应的第一缓存信息,将这些相同来源的数据处理请求中其他数据处理请求对应的缓存信息占用的缓存资源进行释放。
进一步的,后续如果接收到相同来源的第三数据处理请求(即第三数据处理请求与第一数据处理请求来源于同一主设备的同一线程),则可不再为第三数据处理请求申请新的资源来缓存该数据处理请求的上下文,而是当被保留的缓存信息所占用的缓存资源被释放后,使用该缓存信息来存储第三数据处理请求的上下文。
比如,仍以第一数据处理请求与第二数据处理请求来源于同一主设备的同一线程为例,当第一数据处理请求的处理操作满足超时条件时,可释放第二数据处理请求对应的第二缓存信息所占用的缓存资源,仅保留第一数据处理请求对应的第一缓存信息,并可将第一缓存信息标记为锁定。当接收到第三数据处理请求后,确定第三数据处理请求所来源的主设备和线程,与被标记为锁定的第一缓存信息中的主设备信息和线程ID相匹配,因此不再为第三数据处理请求申请新的缓存资源,而是当第一缓存信息占用的缓存资源被释放后,用该缓存资源存储第三数据处理请求的上下文。
参见图5,为本申请实施例提供的片上系统异常处理方法的流程示意图,如图所示,该流程可包括:
S501:接收来自于主设备的第一数据处理请求。
可选的,所述第一数据处理请求可以是读数据请求,也可以是写数据请求。
S502:向目标从设备发送第一数据处理请求。
可选的,该步骤中,基于图1所示的系统架构,当片上系统中的入口处理单元接收到第一数据处理请求后,可进行协议转换,并将协议转换后的第一数据处理请求发送到总线,以便由总线传输给目标从设备。
S503:监测第一数据处理请求的处理操作是否满足超时条件,当监测到第一数据处理请求的处理操作满足超时条件时,转入S504,否则转入S505。
导致第一数据处理请求超时的原因可能是:总线对该第一数据处理请求进行路由时,有可能由于互联通路的故障而导致无法将第一数据处理请求发送给目标从设备,进而无法接收目标从设备返回的处理响应,从而导致处理操作超时。导致第一数据处理请求超时的 另一原因可能是:目标从设备的故障,导致无法返回处理响应,因而导致片上系统无法接收目标从设备返回的处理响应,从而导致处理操作超时。
本申请实施例中,可对第一数据处理请求的处理操作是否满足超时条件进行监测,超时监测的方法如前所述,在此不再重复。当监测到第一数据处理请求的处理操作超时时,可转入步骤504,以进入异常处理流程。
S504:执行异常处理流程。
异常处理流程中可包括以下步骤:将第一数据处理请求发送给虚拟从设备,以触发虚拟从设备返回第一数据处理请求的处理响应。
该步骤中,虚拟从设备接收到第一数据处理响应后,可返回处理响应,比如处理失败响应。进一步的,可将该处理响应发送给该第一数据处理请求的发送方主设备。
可选的,当监测到第一数据处理请求的处理操作满足超时条件时,还可上报超时中断事件。可选的,超时中断事件中携带所述第一数据处理请求的信息,比如第一数据处理请求的上下文。
可选的,当监测到第一数据处理请求的处理操作满足超时条件时,还可将第一数据处理请求的信息保存到系统异常事件日志中。
S505:采用常规处理操作。比如等待目标从设备返回处理响应。
上述流程中,第一数据处理请求的处理操作超时时,将其发送给虚拟从设备进行处理,由虚拟从设备返回处理响应,从而及时结束该处理操作超时的第一数据处理请求,以避免该处理操作超时的第一数据处理请求长时间占用系统资源而导致系统挂死,进而可保证片上系统的稳定性和可靠性。
可选的,若在接收到虚拟从设备返回的第一数据处理请求的处理响应之前,接收到来自于该目标从设备的处理响应,则丢弃该来自于所述目标从设备的处理响应。
可选的,接收来自于主设备的第一数据处理请求之后,还包括:生成第一缓存信息,第一缓存信息包括第一数据处理请求的信息,比如第一数据处理请求的上下文;在监测到第一数据处理请求的处理操作满足超时条件之前,还包括:接收第二数据处理请求,第二数据处理请求与第一数据处理请求来自于同一主设备的同一线程;生成第二缓存信息,第二缓存信息包括所述第二数据处理请求的信息。当监测到所述第一数据处理请求的处理操作满足超时条件时,还包括:保留第一缓存信息并释放第二缓存信息占用的缓存资源,或者保留第二缓存信息并释放第一缓存信息占用的缓存资源。
进一步的,当监测到第一数据处理请求的处理操作满足超时条件时,将与第一数据处理请求来源相同的第二数据处理请求发送给虚拟从设备。
可选的,在保留第一缓存信息并释放第二缓存信息占用的缓存资源之后,还包括:接收第三数据处理请求,第三数据处理请求与第一数据处理请求来自于相同主设备的相同线程;当基于虚拟从设备返回的处理响应,将第一缓存信息占用的缓存资源释放后,生成第三缓存信息,第三缓存信息包括第三数据处理请求的信息,第三缓存信息占用的缓存资源与第一缓存信息占用的缓存资源相同。
可选的,在保留第二缓存信息并释放第一缓存信息占用的缓存资源之后,还包括:接收第三数据处理请求,第三数据处理请求与第一数据处理请求来自于相同主设备的相同线程;当基于虚拟从设备返回的处理响应,将第二缓存信息占用的缓存资源释放后,生成第三缓存信息,第三缓存信息包括第三数据处理请求的信息,第三缓存信息占用的缓存资源 与第二缓存信息占用的缓存资源相同。
上述流程中各步骤的具体实现方式可参见前述实施例。
基于图5所示的流程,结合图4所示的原理,图6示例性示出了一种具体应用场景下的片上系统异常处理方法的流程示意图,如图所示,该流程可包括:
S601:接收来自于主设备的数据处理请求。
S602:缓存该数据处理请求的上下文,得到该数据处理请求对应的缓存信息。
S603:公共计数器的计数值加1,该数据处理请求对应的私有计数器的计数值在公共计数器的计数值溢出时加1。
S604:该数据处理请求对应的私有计数器的计数值是否达到门限值,或者是否溢出,若是,表明该数据处理请求的处理操作超时,则转入S605,否则转入S612。
S605:上报超时中断事件,在异常事件日志中记录该数据处理请求的上下文。
S606:将缓存队列中该数据处理请求的缓存信息,以及与该数据处理请求来源于同一主设备和同一线程的缓存信息标记为超时。
S607:将标记为超时的缓存信息所对应的数据处理请求发送给虚拟从设备进行响应。
S608:对于S606中标记为超时的缓存信息,仅保留其中一个缓存信息(一个缓存信息是指一个数据处理请求对应的缓存信息,如一个entry),将其他缓存信息占用的缓存资源释放,并将被保留的缓存信息标记为锁定。
S609:如果接收到相同来源(即同一主设备的同一线程)的新的数据处理请求,则不再为该新的数据处理请求申请新的缓存资源。当上述保留的且被标记为锁定的缓存信息占用的缓存资源被释放后,使用该缓存资源(即该标记为锁定的缓存信息所占用的缓存资源)存储该新的数据处理请求的上下文,得到对应的缓存信息。
S610:如果接收到在S606中被标记为超时的缓存信息所对应的数据处理请求的目标从设备返回的处理响应,则直接丢弃。
S611:接收到虚拟从设备返回的处理响应后,释放对应的缓存信息所占用的缓存资源,并可进一步返回给该主设备,完成超时的异常处理操作。
可选的,如果在S606中,被标记为超时的数据处理请求有多个,在S607中,该多个数据处理请求均被发送给虚拟从设备进行响应,在S608中被保留以及被标记为锁定的缓存信息为第一缓存信息,则在S611中,当接收到虚拟从设备针对该多个数据处理请求分别返回的处理响应后,释放第一缓存信息所占用的缓存资源。
S612:按照常规流程进行处理。
需要说明的是,此处仅以计数器采用计数值递增的方式进行计数为例描述,若计数器的计数值采用递减方式计数,则处理过程的原理与上述流程相同。
上述流程中各步骤的具体实现方式可参见前述实施例的描述,在此不再重复。
基于相同的技术构思,本申请实施例还提供一种通信装置,该通信装置可以具有如图7所示的结构,所述通信装置可以实现上述方法的片上系统,也可以是能够实现上述方法的芯片或芯片系统。
如图7所示的通信装置700可以包括至少一个处理器702,所述至少一个处理器702用于与存储器耦合,读取并执行所述存储器中的指令以实现本申请实施例提供的方法涉及的步骤。可选的,该通信装置700还可以包括至少一个接口703,用于为所述至少一个处理器提供程序指令或者数据。通信装置700可执行如图5或图6所示的方法中的步骤。此 外,接口703可用于支持通信装置700进行通信。可选的,通信装置700还可以包括存储器704,其中存储有计算机程序、指令,存储器704可以与处理器702和/或接口703耦合,用于支持处理器702调用存储器704中的计算机程序、指令以实现本申请实施例提供的方法涉及的步骤;另外,存储器704还可以用于存储本申请方法实施例所涉及的数据,例如,用于存储支持接口703实现交互所必须的数据、指令,和/或,用于存储通信装置700执行本申请实施例所述方法所必须的配置信息。
基于与上述方法实施例相同构思,本申请实施例还提供了一种计算机可读存储介质,其上存储有一些指令,这些指令被计算机调用执行时,可以使得计算机完成上述方法实施例、方法实施例的任意一种可能的设计中所涉及的方法。本申请实施例中,对计算机可读存储介质不做限定,例如,可以是RAM(random-access memory,随机存取存储器)、ROM(read-only memory,只读存储器)等。
基于与上述方法实施例相同构思,本申请还提供一种计算机程序产品,该计算机程序产品在被计算机调用执行时可以完成方法实施例以及上述方法实施例任意可能的设计中所涉及的方法。
基于与上述方法实施例相同构思,本申请还提供一种芯片,该芯片可以包括处理器以及接口电路,用于完成上述方法实施例、方法实施例的任意一种可能的实现方式中所涉及的方法,其中,“耦合”是指两个部件彼此直接或间接地结合,这种结合可以是固定的或可移动性的,这种结合可以允许流动液、电、电信号或其它类型信号在两个部件之间进行通信。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本申请实施例中所描述的各种说明性的逻辑单元和电路可以通过通用处理器,数字信号处理器,专用集成电路(ASIC),现场可编程门阵列(FPGA)或其它可编程逻辑装置,离散门或晶体管逻辑,离散硬件部件,或上述任何组合的设计来实现或操作所描述的功能。通用处理器可以为微处理器,可选地,该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现,例如数字信号处理器和微处理器,多个微处理器,一个或多个微处理器联合一个数字信号处理器核,或任何其它类似的配置来实现。
本申请实施例中所描述的方法或算法的步骤可以直接嵌入硬件、处理器执行的软件单元、或者这两者的结合。软件单元可以存储于RAM存储器、闪存、ROM存储器、EPROM 存储器、EEPROM存储器、寄存器、硬盘、可移动磁盘、CD-ROM或本领域中其它任意形式的存储媒介中。示例性地,存储媒介可以与处理器连接,以使得处理器可以从存储媒介中读取信息,并可以向存储媒介存写信息。可选地,存储媒介还可以集成到处理器中。处理器和存储媒介可以设置于ASIC中,ASIC可以设置于终端设备中。可选地,处理器和存储媒介也可以设置于终端设备中的不同的部件中。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管结合具体特征及其实施例对本发明进行了描述,显而易见的,在不脱离本发明的范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本发明的示例性说明,且视为已覆盖本发明范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。
Claims (20)
- 一种片上系统异常处理方法,其特征在于,包括:接收来自于主设备的第一数据处理请求;向目标从设备发送所述第一数据处理请求,并监测所述第一数据处理请求的处理操作是否满足超时条件;当监测到所述第一数据处理请求的处理操作满足超时条件时,将所述第一数据处理请求发送给虚拟从设备,以触发所述虚拟从设备返回所述第一数据处理请求的处理响应。
- 如权利要求1所述的方法,其特征在于,还包括:若在接收到所述虚拟从设备返回的所述第一数据处理请求的处理响应之前,接收到来自于所述目标从设备的处理响应,则丢弃所述来自于所述目标从设备的处理响应。
- 如权利要求1或2所述的方法,其特征在于:接收来自于主设备的第一数据处理请求之后,还包括:生成第一缓存信息,所述第一缓存信息包括所述第一数据处理请求的信息;在监测到所述第一数据处理请求的处理操作满足超时条件之前,还包括:接收第二数据处理请求,所述第二数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;生成第二缓存信息,所述第二缓存信息包括所述第二数据处理请求的信息;当监测到所述第一数据处理请求的处理操作满足超时条件时,还包括:保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源;或者,保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源。
- 如权利要求3所述的方法,其特征在于,还包括:将所述第二数据处理请求发送给所述虚拟从设备。
- 如权利要求3所述的方法,其特征在于:在保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源之后,还包括:接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;当基于所述虚拟从设备返回的处理响应,所述第一缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第一缓存信息占用的缓存资源相同;或者,在保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源之后,还包括:接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;当基于所述虚拟从设备返回的处理响应,所述第二缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第二缓存信息占用的缓存资源相同。
- 如权利要求1-5任一项所述的方法,其特征在于:接收来自于主设备的第一数据处理请求之后,还包括:将公共计数器的计数值增加或减少,其中,每当接收到一个数据处理请求,所述公共计数器的计数值被增加或减少,所述公共计数器的计数值溢出时,所述公共计数器复位;设置所述第一数据处理请求对应的私有计数器,当所述第一计数器的计数值溢出时,所述第一数据处理请求对应的私有计数器的计数值被增加或减少,当所述第一数据处理请求对应的私有计数器的计数值溢出时,所述第一数据处理请求的处理操作满足超时条件。
- 如权利要求6所述的方法,其特征在于,还包括:当接收到的所有数据处理请求均被处理完成后,将所述公共计数器复位。
- 如权利要求1-7任一项所述的方法,其特征在于,还包括:当监测到所述第一数据处理请求的处理操作满足超时条件时,执行以下至少一项处理操作:上报超时中断事件,所述超时中断事件中携带所述第一数据处理请求的信息;将所述第一数据处理请求的信息保存到系统异常事件日志中。
- 一种片上系统,其特征在于,包括:入口处理单元、总线以及出口处理单元,所述入口处理单元中包括虚拟从设备;所述入口处理单元,用于接收来自于主设备的第一数据处理请求;监测所述第一数据处理请求的处理操作是否满足超时条件;当监测到所述第一数据处理请求的处理操作满足超时条件时,将所述第一数据处理请求发送给所述虚拟从设备,以触发所述虚拟从设备返回所述第一数据处理请求的处理响应;所述总线,用于将所述入口处理单元接收到的第一数据处理请求路由到所述出口处理单元;所述出口处理单元,用于向目标从设备发送所述第一数据处理请求。
- 如权利要求9所述的片上系统,其特征在于,所述入口处理单元还用于:若在接收到所述虚拟从设备返回的所述第一数据处理请求的处理响应之前,接收到来自于所述目标从设备的处理响应,则丢弃所述来自于所述目标从设备的处理响应。
- 如权利要求9或10所述的片上系统,其特征在于:所述入口处理单元还用于:接收来自于主设备的第一数据处理请求之后,生成第一缓存信息,所述第一缓存信息包括所述第一数据处理请求的信息;所述入口处理单元还用于:在监测到所述第一数据处理请求的处理操作满足超时条件之前,接收第二数据处理请求,所述第二数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;生成第二缓存信息,所述第二缓存信息包括所述第二数据处理请求的信息;所述入口处理单元还用于:当监测到所述第一数据处理请求的处理操作满足超时条件时,还包括:保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源;或者,保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源。
- 如权利要求11所述的片上系统,其特征在于,所述入口处理单元还用于:将所述第二数据处理请求发送给所述虚拟从设备。
- 如权利要求11所述的片上系统,其特征在于:所述入口处理单元,还用于:在保留所述第一缓存信息,释放所述第二缓存信息占用的缓存资源之后,接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线 程;当基于所述虚拟从设备返回的处理响应,所述第一缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第一缓存信息占用的缓存资源相同;或者,所述入口处理单元,还用于:在保留所述第二缓存信息,释放所述第一缓存信息占用的缓存资源之后,接收第三数据处理请求,所述第三数据处理请求与所述第一数据处理请求来自于所述主设备的同一线程;当基于所述虚拟从设备返回的处理响应,所述第二缓存信息占用的缓存资源被释放后,生成第三缓存信息,所述第三缓存信息包括所述第三数据处理请求的信息,所述第三缓存信息占用的缓存资源与所述第二缓存信息占用的缓存资源相同。
- 如权利要求9-13任一项所述的片上系统,其特征在于,所述入口处理单元还用于:接收来自于主设备的第一数据处理请求之后,将公共计数器的计数值增加或减少,其中,每当接收到一个数据处理请求,所述公共计数器的计数值被增加或减少,所述公共计数器的计数值溢出时,所述公共计数器复位;设置所述第一数据处理请求对应的私有计数器,当所述第一计数器的计数值溢出时,所述第一数据处理请求对应的私有计数器的计数值被增加或减少,当所述第一数据处理请求对应的私有计数器的计数值溢出时,所述第一数据处理请求的处理操作满足超时条件。
- 如权利要求14所述的片上系统,其特征在于,所述入口处理单元还用于:当接收到的所有数据处理请求均被处理完成后,将所述公共计数器复位。
- 如权利要求9-15任一项所述的片上系统,其特征在于,所述入口处理单元还用于:当监测到所述第一数据处理请求的处理操作满足超时条件时,执行以下至少一项处理操作:上报超时中断事件,所述超时中断事件中携带所述第一数据处理请求的信息;将所述第一数据处理请求的信息保存到系统异常事件日志中。
- 一种芯片,其特征在于,所述芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以实现如权利要求1-8中任一项所述的方法。
- 一种通信装置,其特征在于,包括至少一个处理器,所述至少一个处理器与存储器相连,所述至少一个处理器用于读取并执行所述存储器中存储的程序,以使得所述通信装置执行如权利要求1-8中任一项所述的方法。
- 一种计算机存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1-8中任一所述的方法。
- 一种计算机产品,其特征在于,所述计算机程序产品在被计算机调用时,使得计算机执行如权利要求1-8中任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180091953.1A CN116830087A (zh) | 2021-01-28 | 2021-01-28 | 一种片上系统异常处理方法、片上系统及其装置 |
PCT/CN2021/074235 WO2022160206A1 (zh) | 2021-01-28 | 2021-01-28 | 一种片上系统异常处理方法、片上系统及其装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/074235 WO2022160206A1 (zh) | 2021-01-28 | 2021-01-28 | 一种片上系统异常处理方法、片上系统及其装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022160206A1 true WO2022160206A1 (zh) | 2022-08-04 |
Family
ID=82652878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/074235 WO2022160206A1 (zh) | 2021-01-28 | 2021-01-28 | 一种片上系统异常处理方法、片上系统及其装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116830087A (zh) |
WO (1) | WO2022160206A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912079A (zh) * | 2023-09-12 | 2023-10-20 | 北京象帝先计算技术有限公司 | 数据处理系统、电子组件、电子设备及数据处理方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810074A (zh) * | 2012-11-14 | 2014-05-21 | 华为技术有限公司 | 一种片上系统芯片及相应的监控方法 |
US20150012679A1 (en) * | 2013-07-03 | 2015-01-08 | Iii Holdings 2, Llc | Implementing remote transaction functionalities between data processing nodes of a switched interconnect fabric |
CN106557446A (zh) * | 2015-09-28 | 2017-04-05 | 瑞萨电子株式会社 | 总线系统 |
CN108363670A (zh) * | 2017-01-26 | 2018-08-03 | 华为技术有限公司 | 一种数据传输的方法、装置、设备和系统 |
-
2021
- 2021-01-28 CN CN202180091953.1A patent/CN116830087A/zh active Pending
- 2021-01-28 WO PCT/CN2021/074235 patent/WO2022160206A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810074A (zh) * | 2012-11-14 | 2014-05-21 | 华为技术有限公司 | 一种片上系统芯片及相应的监控方法 |
US20150012679A1 (en) * | 2013-07-03 | 2015-01-08 | Iii Holdings 2, Llc | Implementing remote transaction functionalities between data processing nodes of a switched interconnect fabric |
CN106557446A (zh) * | 2015-09-28 | 2017-04-05 | 瑞萨电子株式会社 | 总线系统 |
CN108363670A (zh) * | 2017-01-26 | 2018-08-03 | 华为技术有限公司 | 一种数据传输的方法、装置、设备和系统 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912079A (zh) * | 2023-09-12 | 2023-10-20 | 北京象帝先计算技术有限公司 | 数据处理系统、电子组件、电子设备及数据处理方法 |
CN116912079B (zh) * | 2023-09-12 | 2024-02-20 | 北京象帝先计算技术有限公司 | 数据处理系统、电子组件、电子设备及数据处理方法 |
Also Published As
Publication number | Publication date |
---|---|
CN116830087A (zh) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10506434B2 (en) | System for accelerated network route update through exclusive access to routing tables | |
US11876701B2 (en) | System and method for facilitating operation management in a network interface controller (NIC) for accelerators | |
WO2018076793A1 (zh) | 一种NVMe数据读写方法及NVMe设备 | |
US11579803B2 (en) | NVMe-based data writing method, apparatus, and system | |
US10394751B2 (en) | Programmed input/output mode | |
US12058036B2 (en) | Technologies for quality of service based throttling in fabric architectures | |
US10078543B2 (en) | Correctable error filtering for input/output subsystem | |
US8843651B2 (en) | Software aware throttle based flow control | |
WO2014206078A1 (zh) | 内存访问方法、装置及系统 | |
JP2015043479A (ja) | ネットワーク装置及びプロセッサの監視方法 | |
WO2022160206A1 (zh) | 一种片上系统异常处理方法、片上系统及其装置 | |
US10459791B2 (en) | Storage device having error communication logical ports | |
WO2019169582A1 (zh) | 处理中断的方法和装置 | |
US9621487B2 (en) | Method and apparatus for protection switching based on memory control in packet transport system | |
CN114756489A (zh) | 用于诊断数据的直接存储器访问(dma)引擎 | |
JP2018532167A (ja) | パケットバックプレッシャー検出方法、装置及びデバイス | |
CN110309225B (zh) | 数据处理方法及系统 | |
US12189563B1 (en) | Retry regulator for transactions of an integrated circuit | |
WO2024152588A1 (zh) | 内存访问的页错误处理方法及装置 | |
JPWO2018131550A1 (ja) | コネクション管理ユニット、およびコネクション管理方法 | |
EP4462264A1 (en) | Memory management method and system, and related apparatus | |
WO2022165790A1 (zh) | 一种掉电隔离装置及相关方法 | |
CN119396764A (zh) | 一种芯片互联方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21921816 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180091953.1 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21921816 Country of ref document: EP Kind code of ref document: A1 |