CN116149897B - Chip functional safety fault processing method - Google Patents
Chip functional safety fault processing method Download PDFInfo
- Publication number
- CN116149897B CN116149897B CN202310417392.0A CN202310417392A CN116149897B CN 116149897 B CN116149897 B CN 116149897B CN 202310417392 A CN202310417392 A CN 202310417392A CN 116149897 B CN116149897 B CN 116149897B
- Authority
- CN
- China
- Prior art keywords
- fault
- functional
- execution
- unit
- chip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 238000000034 method Methods 0.000 claims description 33
- 230000006870 function Effects 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 31
- 238000009826 distribution Methods 0.000 claims description 20
- 230000002349 favourable effect Effects 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 9
- 230000001960 triggered effect Effects 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 31
- 238000004590 computer program Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Hardware Redundancy (AREA)
Abstract
The invention relates to a chip function safety fault processing method, which comprises the following steps of S1: an operating system is arranged on the chip, and the operating system creates tasks based on user requests; step S2: sequentially distributing one or more functional tasks to execution units capable of executing corresponding functional types in the chip; step S3: assigning functional tasks to the execution units; determining whether to trigger the execution unit to enter a safety guarantee mode so as to insert redundant execution; the invention can cope with the common influence of complex factors under the condition of complex chip system structure and provide effective safety guarantee.
Description
Technical Field
The invention belongs to the technical field of functional safety of electronic systems, and particularly relates to a chip functional safety fault processing method.
Background
With the rapid development of the electronic technology industry, digital chips are increasingly used in the fields of industry, communication, military and electronic consumption. The digital chip is easy to be broken down along with the time increase or the change of the working environment, and the high fault rate has become one of important factors for restricting the development of the digital chip. With the reduction of process nodes and the increase of chip area, the capacity of the chip is obviously improved, and the yield of the chip also faces a great challenge.
In a plurality of consumption fields, along with the reduction of the price of a vehicle and the price of a chip, the application speed increase in the vehicle field is obvious; it is clear that vehicles are a product closely related to human life safety, the functional safety of which is critical for safety-related electronic and electrical systems in the automotive field. More and more automobiles are equipped with electronic and electrical systems that are more chip dependent and have higher security level requirements.
The existing vehicle-mounted chip is concerned about the function safety of the chip, and whether the safety risk caused by random hardware failure can be reduced to the greatest extent is also concerned, and the functional failure of each functional unit IP on the chip also has a corresponding processing mechanism, so that various types of reporting of software and hardware and the like can be performed. It is well known that digital chip failures are commonly affected by a number of factors including operating temperature, ambient temperature, operating voltage, ambient radiation, device aging, mechanical vibration, and the like; with the continuous decrease of the chip size, the effect of the combined action of the factors is more and more difficult to distinguish and more difficult to overcome independently; in addition, there are thousands to billions of various types of memory cells in the existing chip, and due to process inconsistencies and other various external factors, the performance of individual memory cells therein is inevitably poor or even unusable; if the functional failure caused by only focusing on a single IP is obviously unreasonable, the functional failure caused by the shared storage is obviously not negligible under the condition of higher and higher integration level of the IP because the data communication and the sharing make the influence more complex. How to cope with the common influence of complex factors under the condition of complex chip system structure, and provide effective safety guarantee is the technical problem to be solved.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method for processing a chip functional security fault, where the system includes:
step S1: an operating system is arranged on the chip, and the operating system creates tasks based on user requests; specific: the operating system creates one or more tasks based on the user request, and divides the tasks into one or more functional tasks according to the functional types; the chip comprises a plurality of processing units, each processing unit comprising one or more execution units, wherein: the execution unit is a general kernel or a functional unit; each execution unit has a unique number;
step S2: sequentially distributing the one or more functional tasks to execution units capable of executing corresponding functional types in the chip; the method comprises the following steps: judging whether the functional tasks are distributed completely, if yes, returning to the step S1, otherwise, pre-distributing the functional tasks to an execution unit, judging whether the pre-distribution falls into a fault unit combination table, if so, recording the pre-distribution scheme and putting the pre-distribution scheme into a pre-distribution scheme set, and then returning to the step S2 for re-pre-distribution; at this time, if re-pre-allocation cannot be performed, step S3 is entered; if not, distributing the functional tasks to the execution units according to a pre-distribution scheme; continuing to execute the step S2 until all the functional tasks are distributed;
the fault unit combination table stores one or more fault unit combinations and corresponding fault probabilities and fault codes thereof; each fault unit combination comprises one or more execution unit numbers which are sequentially arranged according to time sequence; the method comprises the steps of indicating corresponding statistical fault codes and corresponding fault probabilities thereof when the execution sequences of execution units indicated in a fault unit combination occur;
step S3: assigning functional tasks to the execution units; determining whether to trigger the execution unit to enter a safety guarantee mode so as to insert redundant execution;
step S31: distributing the current function task to be distributed to an available general kernel;
step S32: determining whether the execution unit is triggered to enter a safety guarantee mode, if so, entering a step S33, otherwise, returning to the step S2;
step S33: acquiring each pre-allocation scheme in a pre-allocation scheme set; acquiring all fault unit combinations of each pre-allocation scheme falling into a fault unit combination table to form a fault unit combination set; determining therefrom the most advantageous combinations of faulty cells; wherein: the most favorable fault unit combination is the fault unit combination with highest dynamic execution efficiency;
step S34: inserting redundancy execution; the method comprises the following steps: creating a function task combination corresponding to the most favorable fault unit combination, and acquiring field data corresponding to the current execution unit as field data of the created function task combination; and distributing and executing the functional task combination, and updating the fault handling mode, so that the updated fault handling mode can acquire an execution result of the inserted redundant execution when a subsequent functional task fails.
Further, the chip is a vehicle-mounted chip.
Further, the partitioned functional tasks are ordered, the order being based on their control flow or data flow order.
Further, the processing unit is a heterogeneous processing unit.
Further, the kernel and the functional unit inside the processing unit share an external bus unit or a shared second-level cache and a memory; the first level cache is shared exclusively by the cores and functional units.
Further, the corresponding function types of each functional unit may be the same or different.
Further, after the allocation of the functional tasks is completed, the pre-allocation scheme set is emptied.
An execution device comprising a processor coupled to a memory, the memory storing program instructions that when executed by the processor implement the chip function safety fault handling method.
A computer readable storage medium comprising a program which, when run on a computer, causes the computer to perform the chip function security fault handling method.
A chip comprising processing circuitry configured to perform the chip functional safety failure processing method.
The beneficial effects of the invention include:
(1) Aiming at a complex chip system structure, a fault unit combination fault handling mode is provided, and on the basis of the existing single-function single-IP single-core fault handling mechanism, effective safety guarantee is provided for cross-task cross-function unit and general-purpose core faults in a complex environment;
(2) Based on a dual guarantee mechanism of fault early avoidance and insertion redundancy execution, under the condition that the current allocation period cannot be avoided, combining follow-up task type detection and fault combination analysis, providing redundancy execution guarantee, and giving a avoidance opportunity again in the next allocation period; thus, in the complete period of task execution, redundancy guarantee is accurately provided while the possibility of hidden faults is avoided;
(3) On the basis of not disturbing the task allocation mode of the original operating system, when the redundant execution is inserted, the redundant execution is enabled to determine the most favorable fault combination through comprehensive quantitative calculation, and the redundant execution occupies the most favorable fault combination, so that the execution of the redundant execution is meaningful and effective, and finally, the overall response speed of the chip is ensured when the fault occurs.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate and together with the description serve to explain the invention, if necessary:
fig. 1 is a schematic diagram of a chip functional security fault handling method provided by the present invention.
Detailed Description
The present invention will now be described in detail with reference to the drawings and the specific embodiments thereof, wherein the exemplary embodiments and the description are for the purpose of illustrating the invention only and are not to be construed as limiting the invention. The invention provides a chip function safety fault processing method, which comprises the following steps:
step S1: an operating system is arranged on the chip, and the operating system creates tasks based on user requests; specific: the method comprises the steps that an operating system creates one or more tasks based on a user request, the tasks are divided into one or more functional tasks according to functional types, an execution unit is a functional unit IP or a general kernel, and a chip comprises a plurality of kernels and a plurality of functional units; wherein each execution unit has a unique number;
preferably: the chip is a vehicle-mounted chip;
preferably: the functional tasks are ordered, the order being based on their control flow or data flow order;
the execution unit is a functional unit IP or a general kernel, and the chip comprises a plurality of kernels and a plurality of functional units; the corresponding function types of each function unit are the same or different; specific: the chip comprises a plurality of processing units, each processing unit comprising one or more cores and functional units; the processing unit is a heterogeneous processing unit; the inner core and the functional unit inside the processing unit share an external bus unit or a shared second-level cache, a memory and the like; the first level cache is shared exclusively by the cores and functional units;
preferably: the functions include function bodies, loop bodies, specific operations, special functions and the like; these functional tasks include tasks that are compatible with the particular functional unit IP type and general tasks; for a specific functional task, the greatest execution efficiency is obviously obtained when the specific functional task is allocated to the functional unit with the consistent functional type, and the efficiency is obviously reduced when the specific functional task is allocated to the general kernel unit; while a functional unit type is only suitable for performing a corresponding type of functional task.
Step S2: on the basis of fault pre-avoidance, sequentially distributing the one or more functional tasks to execution units capable of executing corresponding functional types in the chip; the method comprises the following steps: judging whether the functional tasks are distributed completely, if yes, returning to the step S1, otherwise, pre-distributing the functional tasks to an execution unit, judging whether the pre-distribution falls into a fault unit combination table, if yes, recording the pre-distribution scheme and putting the pre-distribution scheme into a pre-distribution scheme set, and then returning to the step S2 for re-pre-distribution; at this time, if the re-pre-allocation cannot be performed, the process proceeds to step S3; if not, distributing the functional tasks to the execution units according to a pre-distribution scheme; continuing to execute the step S2 until all the functional tasks are distributed;
the pre-allocation of the functional tasks to the execution units is specifically as follows: pre-distributing the functional tasks to corresponding execution units according to the functional categories; when re-pre-allocation is carried out, the re-pre-allocation is carried out according to the function type, and another execution unit with the same function type is selected for re-pre-allocation; at this time, when there is no allocatable execution unit because the number of re-pre-allocation is too large, that is, when re-pre-allocation cannot be performed, step S3 is entered; at this time, it is impossible to avoid possible failures by re-pre-allocation without reducing the execution efficiency; the functional task pre-allocation mode is performed on the basis of not disturbing the task allocation mode of the original operating system, and the original task allocation mode can still be continued by the operating system; when the re-pre-allocation is needed, the situation that the fault unit combination table falls into is eliminated, and the original allocation mode is continuously adopted for the rest part;
in the prior art, only a single IP fault is usually handled, in fact, when multiple execution units are in continuous execution, because of the changes of multiple complex factors such as context sharing, data flow, IP resource sharing, clock asynchronism, etc., the execution environment created by the complex factors is changed, so that a specific execution unit presents a distinct fault occurrence condition when in sequential use, and in this fault scenario, the fault condition corresponding to the multiple execution units is significant because of various complex factors, and can be found through big data or test collection; therefore, the invention correspondingly sets the fault unit combination table for coping with the complex situation;
the fault unit combination table stores one or more fault unit combinations and corresponding fault probabilities and fault codes thereof; each fault unit combination comprises one or more execution unit numbers which are sequentially arranged according to time sequence; the method comprises the steps of indicating corresponding statistical fault codes and corresponding fault probabilities thereof when the execution sequences of execution units indicated in a fault unit combination occur; as in line 1 in table 1, (IP 1, IP3, IP 4) indicates that when IP1, IP3, IP4 consecutively perform functional tasks in time sequence, the failure probability of its corresponding failure code T1 is 0.1%; the table is a preset table and can be collected in the chip test process or obtained through big data statistics; of course, each IP unit performs a functional task corresponding to its functional type; the type of function corresponding to each IP may be the same or different;
TABLE 1 Fault Unit combination list
Numbering device | Fault unit combination | Probability of failure | Fault code |
1 | (IP1,IP3,IP4) | 0.1% | T1 |
2 | (IP1,IP2,IP4) | 0.5% | T1 |
3 | (IP1,IP3,IP5) | 0.1% | T3 |
4 | (IP2,IP4,IP6) | 0.01% | T4 |
5 | (IP5,IP1,IP3,IP4) | 0.02% | T2 |
Judging whether the pre-allocation falls into a fault unit combination table or not, specifically: the corresponding execution unit is a current execution unit according to the pre-allocation; when a pre-allocation scheme is adopted to perform functional task allocation, determining that the pre-allocation falls into a fault unit combination table if an execution unit sequence formed by a current execution unit and an execution unit sequence formed by an execution unit with a pre-allocated preamble in the current execution unit is matched with an execution unit sequence contained in one or more preamble parts of fault unit combinations in the fault unit combination table; wherein: the scene fault combination comprises a preamble part, a current execution unit and a subsequent part; literally, the preamble is all parts that appear before the current execution unit, and the following parts are all parts that appear after the current execution unit; for example: currently pre-allocated execution unit IP3, whose preamble has allocated execution unit IP1, then (IP 1, IP 3) matches the faulty unit combination numbered 1 and 3, but does not match 5.
Step S3: assigning functional tasks to the execution units; determining whether to trigger the execution unit to enter a safety guarantee mode so as to insert redundant execution;
step S31: selecting a first pre-allocation scheme from the several pre-allocation schemes tried in the step S2 for allocation, allocating a current functional task to a corresponding execution unit of the pre-allocation scheme, and deleting the first pre-allocation scheme;
alternatively, the following is used: step S31 is to assign the current functional task to the general kernel; further: the general kernel and the previous (previous several) functional tasks are located in the same processing unit;
step S32: determining whether the execution unit is triggered to enter a safety guarantee mode, if so, entering a step S33, otherwise, returning to the step S2;
whether the trigger condition is met is determined specifically as follows: judging the fault probability and fault codes corresponding to the fault unit combinations falling into the fault unit combination table after the distribution is carried out according to the first pre-distribution scheme; when the fault code is of a first preset code type or the fault code is of a second preset code type and the fault probability is larger than the combination of the preset probability values, determining that the trigger condition is met;
preferably: the first preset code type and the second preset code type are preset types; wherein: the first preset code type indicates that the fault type is serious; and the second preset code type indicates that the fault type is not severe;
alternatively, the following is used: when the security level is high, determining triggering, otherwise, not triggering; the security level can be set by a hardware switch mode;
step S33: acquiring each pre-allocation scheme in a pre-allocation scheme set; obtaining fault unit combinations of all pre-allocation schemes falling into a fault unit combination table to form a fault unit combination set; determining therefrom the most advantageous combinations of faulty cells;
the determination of the most advantageous faulty unit combination comprises in particular the following steps:
step S33A1: searching the type of the subsequent functional task of the current functional task in the functional task queue to be allocated to form a subsequent functional task type sequence; the length of the subsequent functional task type sequence is equal to the length of the subsequent part;
step S33A2: acquiring an unprocessed fault unit combination and acquiring the subsequent part thereof;
step S33A3: sequentially comparing the first element of the subsequent part with the corresponding second element in the subsequent functional task type sequence, and if the type of the functional task which can be executed by the first element is matched with the type of the second element, considering that the first element is matched with the corresponding second element; continuing to compare until all elements are compared; if all the elements are matched, the unprocessed fault unit combination is used as a fault unit combination to be determined, otherwise, the unprocessed fault unit combination is determined to be an invalid fault unit combination;
step S33A4: continuously acquiring an unprocessed fault unit combination from the fault unit combination set, and returning to the step S33A2 until all the fault unit combinations are processed;
step S33A5: sequentially determining the dynamic execution efficiency of each fault unit combination to be determined, and selecting the fault unit combination with the highest execution efficiency as the most favorable fault unit combination;
the determining the dynamic execution efficiency of each fault unit combination to be determined specifically comprises the following steps: determining a first distance between a current execution unit and a preamble part in a fault unit combination to be determined, a second distance between the current execution unit and a subsequent part, and a third distance between the preamble part and the subsequent part; determining the execution speed of a fault unit combination to be determined; calculating dynamic execution efficiency based on the execution speed, the first distance, the second distance and the third distance; wherein: the first distance, the second distance and the third distance are used for indicating the sizes of communication overhead and site switching overhead; the larger the distance is, the larger the cost is, otherwise, the smaller the distance is, the smaller the cost is;
preferably: the determining the first distance specifically includes: calculating a first distance Ds1 by the following formula;
preferably: the determining the second distance specifically includes: calculating a second distance Ds2 using the following formula;
wherein:is the current execution unit,/->Is the j-th execution unit in the fault unit combination to be determined; />Is a count value; n is the combined length of the fault units to be determined;
preferably: the determining the third distance Ds3 specifically includes: setting Ds3 to be equal to the accumulated number of combinations of two elements which are respectively positioned in the preamble part and the subsequent part and are positioned in the same processing unit, wherein the element interval distance is smaller than a preset distance value; for example: (IP 1, IP2, IP 3), wherein IP2 is the current execution unit, and IP1 and IP3 are located in the same processing unit, and the preset value is 5, then since IP1 and IP3 are separated by 1 element, and 1<3, a combination number is accumulated; the size of the preset distance value is related to the processing capacity of the processing unit and the size of the secondary storage space; the larger the size is, the larger the number of sites can be accommodated, and the larger the preset value is;
preferably: data values of the first distance, the second distance and the third distance are pre-determined and stored in a correlated mode with the fault unit group, and the data values are obtained in a searching mode when the data values are determined;
preferably: the determining the execution speed is specifically as follows: calculating an execution speed Sd by adopting the following method;
the calculation dynamic execution efficiency is specifically as follows: calculating dynamic execution efficiency Ef by adopting the following method;
step S34: inserting redundancy execution; the method comprises the following steps: creating a function task combination corresponding to the most favorable fault unit combination, and acquiring field data corresponding to the current execution unit as field data of the created function task combination; distributing and executing the functional task combination, and updating a fault handling mode;
the step S34 specifically includes the following steps:
step S341: acquiring a subsequent part of the most favorable fault unit combination; determining a subsequent functional task corresponding to the length of the subsequent part; creating a function task combination comprising a current function task to be allocated and a subsequent function task corresponding to the length;
step S342: acquiring field data corresponding to a current execution unit as field data of the created functional task combination;
step S343: the current functional tasks are distributed to the current execution units of the most favorable fault unit combination, and each of the subsequent functional tasks is distributed to each corresponding execution unit in the subsequent part respectively;
step S344: updating a fault handling mode for each subsequent functional task, and acquiring an execution result of the inserted redundant execution when the subsequent functional task fails according to the updated fault handling mode; specific: updating the fault handling mode of each functional task in the subsequent functional tasks so that each functional task in the subsequent functional tasks skips the execution of each functional task in the subsequent functional tasks after faults occur and directly obtains or waits for the execution result of the functional task combination as the execution result of the subsequent functional tasks; for example: storing an execution result in a specific storage space after the execution of the functional task combination is finished, performing fault treatment through a corresponding interrupt program after each of the current functional task and the subsequent functional task fails, and accessing the specific storage space position in the treatment process to obtain an execution result of the functional task combination obtained through redundant execution; at this time, regardless of the occurrence of a failure in one of the subsequent tasks, execution of these tasks can be spanned by redundant execution without reducing the response speed of the user's demand; of course, the number of redundant execution which is inserted simultaneously can be increased, so that the degree of safety guarantee can be increased, the increasing mode can be not only limited to the most favorable fault group scene combination, but also the defect that resource waste exists is possible, and the two can not be simultaneously considered;
on the basis of not disturbing the task allocation mode of the original operating system, the invention enables the redundant execution to occupy the most favorable fault combination when the redundant execution is inserted, so that the execution of the redundant execution is meaningful and effective; at this time, the subsequent functional tasks may be prevented from falling into the fault combination during the task allocation process again because of the dynamic change of the execution units;
preferably: the fault probability and the fault code are obtained by detecting fault indication signals sent by all functional units in the chip;
preferably: determining a fault code by detecting a fault pin output of each functional unit; for example: outputting 128M waveform indication to normally execute when the fault pin is out; while outputting 1/8 of the waveform, high level, low level, etc. of the normal state waveform indicates different fault codes;
the invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
In the above embodiments, while the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory structures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed. The embodiments of the invention are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object or other unit suitable for use in a computing environment. The computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, subroutines, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.
Claims (10)
1. A method for processing a chip functional safety failure, the method comprising:
step S1: an operating system is arranged on the chip, and the operating system creates tasks based on user requests; specific: the operating system creates one or more tasks based on the user request, and divides the tasks into one or more functional tasks according to the functional types; the chip comprises a plurality of processing units, each processing unit comprising one or more execution units, wherein: the execution unit is a general kernel or a functional unit; each execution unit has a unique number;
step S2: sequentially distributing the one or more functional tasks to execution units capable of executing corresponding functional types in the chip; the method comprises the following steps: judging whether the functional tasks are distributed completely, if yes, returning to the step S1, otherwise, pre-distributing the functional tasks to an execution unit, judging whether the pre-distribution falls into a fault unit combination table, if so, recording a pre-distribution scheme and putting the pre-distribution scheme into a pre-distribution scheme set, and then returning to the step S2 for re-pre-distribution; at this time, if re-pre-allocation cannot be performed, step S3 is entered; if not, distributing the functional tasks to the execution units according to a pre-distribution scheme; continuing to execute the step S2 until all the functional tasks are distributed;
the fault unit combination table stores one or more fault unit combinations and corresponding fault probabilities and fault codes thereof; each fault unit combination comprises one or more execution unit numbers which are sequentially arranged according to time sequence; the method comprises the steps of indicating corresponding statistical fault codes and corresponding fault probabilities thereof when the execution sequences of execution units indicated in a fault unit combination occur;
step S3: assigning functional tasks to the execution units; determining whether to trigger the execution unit to enter a safety guarantee mode so as to insert redundant execution;
step S31: distributing the current function task to be distributed to an available general kernel;
step S32: determining whether the execution unit is triggered to enter a safety guarantee mode, if so, entering a step S33, otherwise, returning to the step S2;
step S33: acquiring each pre-allocation scheme in a pre-allocation scheme set; acquiring all fault unit combinations of each pre-allocation scheme falling into a fault unit combination table to form a fault unit combination set; determining therefrom the most advantageous combinations of faulty cells; wherein: the most favorable fault unit combination is the fault unit combination with highest dynamic execution efficiency;
step S34: inserting redundancy execution; the method comprises the following steps: creating a function task combination corresponding to the most favorable fault unit combination, and acquiring field data corresponding to the current execution unit as field data of the created function task combination; and distributing and executing the functional task combination, and updating the fault handling mode, so that the updated fault handling mode can acquire an execution result of the inserted redundant execution when a subsequent functional task fails.
2. The chip functional safety failure processing method according to claim 1, wherein the chip is a vehicle-mounted chip.
3. The chip functional safety failure processing method according to claim 2, wherein the divided functional tasks are ordered, the order being based on their control flow or data flow order.
4. A chip functional safety failure processing method according to claim 3, wherein the processing unit is a heterogeneous processing unit.
5. The method for processing the chip function security fault according to claim 4, wherein the core and the functional units inside the processing unit share an external bus unit or a shared second-level cache, a memory; the first level cache is shared exclusively by the cores and functional units.
6. The method of claim 4, wherein the function type of each functional unit is the same or different.
7. The method for processing the chip functional safety fault according to claim 4, wherein the pre-allocation scheme set is emptied after the functional tasks are uniformly allocated.
8. An execution device comprising a processor coupled to a memory, the memory storing program instructions that when executed by the processor implement the chip functional safety failure processing method of claim 1.
9. A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the chip function security fault handling method of claim 1.
10. A chip comprising processing circuitry configured to perform the chip functional safety failure processing method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310417392.0A CN116149897B (en) | 2023-04-19 | 2023-04-19 | Chip functional safety fault processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310417392.0A CN116149897B (en) | 2023-04-19 | 2023-04-19 | Chip functional safety fault processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116149897A CN116149897A (en) | 2023-05-23 |
CN116149897B true CN116149897B (en) | 2023-07-04 |
Family
ID=86350975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310417392.0A Active CN116149897B (en) | 2023-04-19 | 2023-04-19 | Chip functional safety fault processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116149897B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115220974A (en) * | 2022-07-15 | 2022-10-21 | 苏州浪潮智能科技有限公司 | Dynamic checking system, method, device and medium for network information of operating system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112506718B (en) * | 2021-02-05 | 2021-05-11 | 浙江地芯引力科技有限公司 | Safety chip processor and processing method of fault redundancy mechanism |
CN114356534B (en) * | 2022-03-16 | 2022-06-03 | 苏州云途半导体有限公司 | Processing unit task scheduling method and device |
CN114860531B (en) * | 2022-07-06 | 2022-09-23 | 北京智芯半导体科技有限公司 | Fault detection method and device for security chip, electronic equipment and medium |
CN114968646A (en) * | 2022-07-27 | 2022-08-30 | 南京芯驰半导体科技有限公司 | Functional fault processing system and method |
-
2023
- 2023-04-19 CN CN202310417392.0A patent/CN116149897B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115220974A (en) * | 2022-07-15 | 2022-10-21 | 苏州浪潮智能科技有限公司 | Dynamic checking system, method, device and medium for network information of operating system |
Also Published As
Publication number | Publication date |
---|---|
CN116149897A (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108508874B (en) | Method and device for monitoring equipment fault | |
US20120030430A1 (en) | Cache control apparatus, and cache control method | |
KR102254159B1 (en) | Method for detecting real-time error in operating system kernel memory | |
US6701494B2 (en) | Method of using testbench tests to avoid task collisions in hardware description language | |
US20180292988A1 (en) | System and method for data access in a multicore processing system to reduce accesses to external memory | |
US7483817B2 (en) | Test method, test program, and test device of data processing system | |
CN111488323A (en) | Data processing method and device and electronic equipment | |
CN116149897B (en) | Chip functional safety fault processing method | |
US20160246825A1 (en) | Columnar database processing method and apparatus | |
US20150379788A1 (en) | Method for managing fault messages of a motor vehicle | |
CN117785292A (en) | Verification method and verification device for cache consistency of multi-core processor system | |
KR102309667B1 (en) | Method and device for end-to-end monitoring situation of massive transaction efficiently in order to input and output trace information at high speed | |
US20190042389A1 (en) | Design assistance device, design assistance method, and recording medium storing design assistance program | |
CN116974951A (en) | Storage address management method and system based on SV | |
US7584464B2 (en) | Software processing method and software processing system | |
CN111752704B (en) | Distributed storage file system MDS log disk-dropping method and device | |
GB2299184A (en) | Initial diagnosis of a processor | |
JP6993472B2 (en) | Methods, devices, electronic devices, and computer storage media for detecting deep learning chips | |
CN113572747A (en) | Method and device for processing IP address, storage medium and processor | |
US9632830B1 (en) | Cache retention analysis system and method | |
WO1996000948A2 (en) | A data processing apparatus for modelling an asynchronous logic circuit | |
US20240354231A1 (en) | Partition based structural coverage hit-map collection from hardware trace data | |
EP4455886A1 (en) | Partition based structural coverage hit-map collection from hardware trace data | |
CN113326171B (en) | Memory data processing method and device and electronic equipment | |
JPH07105013A (en) | Register allocation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 808, 8th Floor, Building A10, No. 777 Jianshe West Road, Binhu District, Wuxi City, Jiangsu Province, 214000 Patentee after: Jiangsu Yuntu Semiconductor Co.,Ltd. Address before: 215500 room 805, No. 1, Southeast Avenue, Changshu high tech Industrial Development Zone, Suzhou City, Jiangsu Province Patentee before: Suzhou yuntu Semiconductor Co.,Ltd. |
|
CP03 | Change of name, title or address |