WO2021164679A1 - 面向车规级芯片功能安全的故障管理系统 - Google Patents

面向车规级芯片功能安全的故障管理系统 Download PDF

Info

Publication number
WO2021164679A1
WO2021164679A1 PCT/CN2021/076492 CN2021076492W WO2021164679A1 WO 2021164679 A1 WO2021164679 A1 WO 2021164679A1 CN 2021076492 W CN2021076492 W CN 2021076492W WO 2021164679 A1 WO2021164679 A1 WO 2021164679A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
chip
type
module
car
Prior art date
Application number
PCT/CN2021/076492
Other languages
English (en)
French (fr)
Inventor
魏斌
张力航
李斌
Original Assignee
南京芯驰半导体科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京芯驰半导体科技有限公司 filed Critical 南京芯驰半导体科技有限公司
Publication of WO2021164679A1 publication Critical patent/WO2021164679A1/zh
Priority to US17/891,501 priority Critical patent/US20220392280A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C5/00Registering or indicating the working of vehicles
    • G07C5/08Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle or waiting time
    • G07C5/0808Diagnosing performance data
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60TVEHICLE BRAKE CONTROL SYSTEMS OR PARTS THEREOF; BRAKE CONTROL SYSTEMS OR PARTS THEREOF, IN GENERAL; ARRANGEMENT OF BRAKING ELEMENTS ON VEHICLES IN GENERAL; PORTABLE DEVICES FOR PREVENTING UNWANTED MOVEMENT OF VEHICLES; VEHICLE MODIFICATIONS TO FACILITATE COOLING OF BRAKES
    • B60T17/00Component parts, details, or accessories of power brake systems not covered by groups B60T8/00, B60T13/00 or B60T15/00, or presenting other characteristic features
    • B60T17/18Safety devices; Monitoring
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0733Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a data processing system embedded in an image processing device, e.g. printer, facsimile, scanner
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0736Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function
    • G06F11/0739Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function in a data processing system embedded in automotive or aircraft systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0769Readable error formats, e.g. cross-platform generic formats, human understandable formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2284Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by power-on test, e.g. power-on self test [POST]

Definitions

  • This application relates to a system fault management system for passenger cars, and in particular to a system fault management system oriented to the functional safety of car-level chips.
  • Functional safety is essential for safety-related electrical and electronic systems (such as power control systems) in the automotive field.
  • These functional safety (Functional Safety) applications can impose strict constraints on the system to execute safely and reliably in a complex system environment.
  • Safety Mechanism is integrated inside the car-level chip.
  • the safety mechanism can include the safety mechanism inside the IP (a designed module inside the chip) and the safety mechanism at the system level.
  • the current vehicle-level chips have a great load in terms of fault identification, classification, and processing, and they cannot take reasonable fault response measures in an effective and timely manner, thereby reducing the availability of the system when a fault occurs.
  • the fault management further includes a fault injection module (Fault Injector), a static signal detection module (Static Signal Monitor), and a fault control module (Fault Controller).
  • a fault injection module Fault Injector
  • static signal detection module Static Signal Monitor
  • a fault control module Fault Controller
  • the fault injection module (Fault Injector) is connected to all the functional modules (IP1...IPn) inside the chip through electrical connection, and each functional module (IP1...IPn) is equipped with a safety mechanism.
  • the fault control module (Fault Controller) is connected to each IP (IP1...IPn), static signal detection module (Static Signal Monitor), processor (CPU), system controller (System Controller), and chip external system through electrical connection. (out of chip).
  • the static signal detection module (Static Signal Monitor) is connected to the system configuration module (System Configure) inside the chip through an electrical connection.
  • the fault controller (Fault Controller) is responsible for summarizing the fault indication signals ( Fault Indicated Signals (Fault Indicated Signals) sent.
  • the static signal detection module (Static Signal Monitor) performs real-time monitoring of the static signal generated by the system configuration module (System Configure) inside the chip, so as to avoid the stuck-at fault (Stuck-at Fault). ) Caused by the failure.
  • the fault indication signal generated by the static signal detection module is output to the fault controller (Fault Controller) for classification processing.
  • the fault injection module (Fault Injector) is connected to all the functional modules (IP1...IPn) inside the chip through electrical connection, and each functional module (IP1...IPn) is equipped with a safety mechanism.
  • the fault control module (Fault Controller) is connected to each IP (IP1...IPn), static signal detection module (Static Signal Monitor), processor (CPU), system controller (System Controller), and chip external system through electrical connection. (out of chip), the fault control module (Fault Controller) has a built-in fault classification management model composed of four types of faults.
  • the static signal detection module (Static Signal Monitor) is connected to the system configuration module (System Configure) inside the chip through an electrical connection.
  • Type 1 Faults that require assistance from an external system are configured as Fail Fatal; Type 2: Faults that fail the main function are configured as Fail Safe; Type 3: Configure the faults handled by the automatic degraded operation as Fail Operational; Type 4: Configure the faults handled by the automatic error correction operation as Fail Correctable.
  • the four types of failure severity are configured as: Rule 1: Type 1> Main Type 2> ⁇ Type 3, Type 4 ⁇ , where " ⁇ Type 3, Type 4 ⁇ " means the collection of Type 3 and Type 4; Rule 2: Type 3>Type 4; Rule 3: Rule 1>Rule 2.
  • a fault controller (Fault Controller) generates a four-level structure of fault information composed of four types of faults according to different scenarios and fault types to which the chip is applied according to pre-configuration.
  • the fault controller further includes 4 fault selection units (Fault Selection), and the generated fault information and the input fault indication signal can be connected to the fault selection unit (Fault Selection). Selection) configuration forms a variety of corresponding relationships.
  • multiple correspondences include: one-to-one (1 to 1), one-to-many (1 to N), and/or many-to-one (N to 1) to adapt to different Application scenarios and different functional safety level requirements.
  • the system fault management system oriented to the functional safety of vehicle-level chips provided by this application can ensure that the system software accurately locates and responds to various faults through a fine-grained fault classification system, and can effectively and timely take reasonable fault response measures to improve the system Availability in the event of a fault; at the same time, reducing the system software fault detection load is conducive to fast, high coverage, and individually configurable power-on and power-down self-checks for the chip.
  • FIG. 1 shows a schematic diagram of a four-level fault classification management model designed according to the severity of a chip function fault (Severity Level) in an embodiment of the present application;
  • Fig. 2 shows a logical application flow chart of a four-level fault classification management model (F4CM) according to an embodiment of the present application
  • FIG. 5 shows a logical structure diagram of a fault management system (Fault Management) oriented to the functional safety of a car-level chip according to an embodiment of the present application.
  • Fault Management fault Management
  • the hardware device may be specially designed and manufactured for the required purpose, or may also be a known device in a general-purpose computer or other known hardware devices.
  • the general-purpose computer has a program stored in it to be selectively activated or reconfigured.
  • Automotive Functional Safety (Functional Safety) design generally follows the ISO (International Organization for Standardization) 26262 standard (for automobiles, the first release in 2011 and the second edition in 2018), which are based on the functional safety of electronic, electrical and programmable devices. Derived from the standard IEC (International Electrotechnical Commission) 61508 (first released in 1998 and the latest version released in 2010), it is mainly positioned in the automotive industry for specific electrical devices, electronic equipment, programmable electronic devices, etc., which are specifically used in the automotive field The components are designed to improve the international standards for the functional safety of automotive electronics and electrical products.
  • ISO International Organization for Standardization
  • IEC International Electrotechnical Commission
  • the safety goal derives the system-level safety requirement, and then the safety requirement is allocated to the hardware and software.
  • the ASIL level determines the requirements for system security. The higher the ASIL level, the higher the security requirements for the system, and the higher the cost to achieve security, which means the higher the diagnostic coverage of the hardware and the stricter the development process.
  • the development cost of the company has increased, the development cycle has been extended, and the technical requirements have been strict.
  • the ISO 26262 Functional Safety (Functional Safety) standard requires that the Single-Point Fault Metric (SPFM) be greater than or equal to 99% to achieve the highest safety integrity level ASIL D. Therefore, meeting functional safety can be complicated and difficult for real-time systems.
  • the safety mechanism can include the safety mechanism inside the IP (a designed module inside the chip) and the safety mechanism at the system level.
  • these safety mechanisms need to report the occurrence of the fault in time, so that the system can respond to the fault according to the type and degree of the fault, so as to avoid the potential of the fault or the function failure directly caused by the fault. .
  • the lack of a centralized fault management module inside the chip imposes a great load on the identification, classification and processing of system software faults, and it is also not conducive to the realization of fast, high coverage, and personalized configuration of the chip. Power-on and Power-down self-check.
  • the faults are classified, but the classification granularity is very large (the faults are divided into two categories: Fatal and Error), which makes the system unable to take effective and timely measures.
  • Reasonable failure response measures reduce the availability of the system when a failure occurs.
  • the embodiment of the present application provides a fault management system oriented to the functional safety of a vehicle-level chip.
  • the fault management system includes an out of chip system (out of chip) and a vehicle-level chip.
  • the vehicle-level chip includes a fault management (Fault Management). ).
  • the Fault Management is configured with a fault classification management model.
  • the fault manager with fault classification management model can be used to ensure that the system software accurately locates and responds to various faults through a fine-grained fault classification system, thus effectively , Take reasonable failure response measures in a timely manner to improve the availability of the system when a failure occurs.
  • the vehicle-level chip may also include a processor (CPU), a system controller (System Controller), a system configuration module (System Configure), an in-chip functional module (IP1...IPn), etc. .
  • processor CPU
  • System Controller System Controller
  • System Configure system configuration module
  • IP1...IPn in-chip functional module
  • application scenario refers to an application scenario in a car to which a chip (vehicle-level chip) is applied, and mainly relates to an environment constituted by different systems or components in a car.
  • the car-level chip will integrate the security mechanism inside the IP and the security mechanism at the system level. When a fault occurs and is detected by the corresponding security mechanism, these security mechanisms need to report the occurrence of the fault in time, so that the system can respond according to the type and degree of the fault Response to failures, so as to avoid the potential of the failure or the failure of the function directly caused by the failure.
  • random failures of the internal hardware of the chip can be distinguished according to the following dimensions (W1 to W3):
  • a failure that requires assistance from an external system is defined as a “fatal failure (Fail Fatal)";
  • the failures of all functional modules (IP1...IPn) in the vehicle-level chip can be divided into the four categories described in Table 1 (fault levels 1-4 correspond to types 1-4 in turn).
  • Table 1 can be used in engineering practice to classify and mark random hardware faults inside the chip so that the system can automatically determine the type of fault and accurately locate the fault location.
  • the fault classification is refined from the current common fatal (Fatal) and error (Error) faults into the above-mentioned four categories (Type 1 to Type 4), which improves the classification granularity.
  • Software or hardware can directly handle correspondingly. The response speed of the fault is improved.
  • the use scenario can be customized.
  • the fault classification method can be customized to meet different application scenarios and improve the flexibility of chip application.
  • step S2-1 a functional failure of an IP inside the chip is detected, that is, a fault indication signal (Fault Indicated Signals) sent by the safety mechanism is received.
  • a fault indication signal Fram Indicated Signals
  • step S2-3 if the judgment result is "Yes”, it is determined to be Fail Safe, and the IP function failure signal (Fail Safe) information is output to the system controller (System Controller) inside the chip. ) Carry out necessary operations such as automatic reset to make the system enter a safe state or resume operation; if the judgment result is "No”, proceed to the next judgment step according to the four-level fault classification management model (F4CM), that is, after judging the failure, the chip Do the main functions of the internal hardware or the software system running on the chip need to be degraded?
  • F4CM four-level fault classification management model
  • step S2-4 if the judgment result is "Yes”, it is determined to be a failure operation (Fail Operational), and the information of the IP function failure signal (Fail Operation, Fail Operation) is output to the processor (CPU) inside the chip Hand over to the software running on the CPU for degraded operation processing; if the judgment result is "No", it is determined as a Fail Correctable fault (Fail Correctable), and the IP function fault signal (Fail Correctable) information is output
  • the processor (CPU) inside the chip is handed over to the software running on the CPU to perform automatic error correction processing through a security mechanism, or the security mechanism in the IP performs self-correction.
  • the fault management system determines from low to high to which level the fault should be classified, and during execution, the fault is processed in the order from low to high.
  • the process of handling relatively serious faults can be accelerated, and the response time of fault handling can be shortened.
  • the classification criteria for high and low failure levels are based on the numbers shown in Table 1, that is, the highest failure level is the correctable failure represented by the number 4, and the lowest failure level is represented by the number. For fatal faults, the smaller the number of the fault level, the greater the severity of the fault.
  • Fig. 3 shows a logic application flowchart of a four-level fault classification management model (F4CM) according to another embodiment of the present application.
  • F4CM fault classification management model
  • the fault manager may further include a classifier, which is used to receive a signal of a functional failure of each functional module inside the chip and determine the type of the functional failure. Using the classifier to pre-judge the type of functional failure can reduce the steps of logical judgment, simplify calculations, and improve processing efficiency.
  • the fault manager including the classifier may execute the following steps S3-1 to S3-3, where the difference between the embodiment in FIG. 3 and the embodiment in FIG. 2 lies in that, FIG. 3 In the embodiment, the judgment logic of the four-level fault has been changed.
  • a classifier is used to receive the functional fault signals from the internal IP1...IPn of the chip, and simultaneously determine which type of fault the functional fault belongs to based on the four different types of fault attributes.
  • the four-level fault classification management model (F4CM) is configured in the classifier.
  • step S3-1 a functional failure of an IP inside the chip is detected, that is, a fault indication signal (Fault Indicated Signals) sent by the safety mechanism is received.
  • a fault indication signal Fram Indicated Signals
  • step S3-2 according to the four-level fault classification management model (F4CM), it is judged that the functional fault type of the IP is fatal fault (Fail Fatal), fail safe (Fail Safe), fault operation (Fail Operational), and correctable Which of the four types of failure (Fail Correctable)?
  • F4CM fault classification management model
  • step S3-3 when the type of functional failure is a fatal failure (Fail Fatal), the IP functional failure signal (Fail Fatal) information is output to the out of chip system (out of chip), which is assisted by the external system. Reset, power off or other necessary operations.
  • step S3-3 when the functional failure type is fatal failsafe (Fail Safe), the IP functional failure signal (Fail Safe) information is output to the system controller (System Controller) inside the chip for automatic Necessary operations such as reset to make the system enter a safe state or resume operation.
  • the system controller System Controller
  • step S3-3 when the type of the functional failure is a fatal failure (Fail Correctable), the information of the IP functional failure signal (Fail Correctable) is output to the processor (CPU) inside the chip
  • the software running on the CPU performs automatic error correction processing through the security mechanism or the security mechanism in the IP performs automatic error correction.
  • the logic application embodiment of the four-level fault classification management model (F4CM) of the present application is a low-cost and high-efficiency system fault management system for car-level chip functional safety, which can be centralized, hierarchical, and detailed.
  • the granular chip function fault management system can effectively detect and classify the faults inside the chip according to the severity, so as to provide the system with accurate fault information, ensure that the system software accurately locates and respond to various faults, and reduces the system software fault detection load , Take reasonable fault response measures effectively and timely to improve the availability of the system when a fault occurs.
  • the fault controller (Fault Controller) is responsible for summarizing the various IP (IP1...IPn) inside the chip and the fault indication signals (Fault Indicated Signals) sent by all safety mechanisms in the chip system. And according to the different scenarios and fault types used by the chip, the fault information corresponding to the four-level fault classification management model (F4CM) shown in Figure 1 is generated according to the pre-configuration.
  • F4CM four-level fault classification management model
  • the fault controller can be further used to summarize its own static signal detection module (Static Signal Monitor), each IP inside the chip, and all security mechanisms sent by the chip system.
  • the fault indication signal (Fault Indicated Signals) sent by the fault indication signal (Fault Indicated Signals).
  • the fault controller may include 4 fault selection units (Fault Selection).
  • Various correspondences can be formed between the generated fault information and the input fault indication signal through the configuration of the fault selection unit (Fault Selection).
  • multiple correspondence relationships include: one-to-one (1 to 1), one-to-many (1 to N), and/or many-to-one (N to 1), where N is a positive integer not less than 2.
  • the fault management system with the controller in this embodiment can adapt to different application scenarios and different functional safety level requirements.
  • fault selection units are provided in the fault controller (Fault Controller), and the four fault selection units correspond to fatal faults (Fail Fatal),
  • the four fault selection units correspond to fatal faults (Fail Fatal)
  • Each IP (IP1...IPn) inside the chip is connected to the fault selection unit (Fault Selection) through electrical signals, so that the fault selection unit (Fault Selection) can receive the fault indication signals (Fault Indicated Signals) sent by each IP inside the chip.
  • each fault selection unit for example, the fault selection unit 1
  • it is signal-connected with multiple functional modules IP1 ⁇ IPn to establish a corresponding relationship.
  • the corresponding relationship is It is the above-mentioned many-to-one;
  • each functional module for example, IP1
  • it is signal-connected with multiple fault selection units 1 to 4 to establish a corresponding relationship.
  • the corresponding relationship is the above-mentioned one-to-many;
  • the corresponding relationship established by the signal connection between a fault selection unit (for example, the fault selection unit 1) and a functional module (for example, IP1) is the above-mentioned one-to-one.
  • the one-to-one, one-to-many, and many-to-one correspondence can exist independently or coexist as shown in FIG. 4, which can be specifically designed according to actual needs. There is no restriction here.
  • a software configuration module may also be provided outside the fault controller (Fault Controller).
  • the software configuration module (Software Configuration) is connected to the 4 fault selection units (Fault Selection) through electrical signals, and is pre-configured according to the different scenarios and fault types used by the chip, so that the fault selection unit can receive the transmission of each IP in the chip Fault Indicated Signals (Fault Indicated Signals).
  • the software configuration module (Software Configuration) can also be used to monitor the working status of the fault selection unit (Fault Selection) in real time. When a fault or logic error occurs in the fault selection unit (Fault Selection), it can perform external monitoring and correction in time. After the software configuration module (Software Configuration) collects and judges the fault indication signals (Fault Indicated Signals), the fault information (Fault Information) is generated.
  • the generated fault information can be sent to the chip's internal modules and external (external systems, such as software configuration modules, etc.) for the following processing: 1) Run the fault (Fail Operational) and correctable faults (Fail Correctable information is output to the processor (CPU) inside the chip for processing by the software running on the CPU; 2) Failsafe information is output to the system controller (System Controller) inside the chip for automatic reset Wait for necessary operations to make the system enter a safe state or resume operation; 3) Output the fatal fault (Fail Fatal) information to the outside of the chip (out of chip), and the external system assists in resetting, powering off, or other necessary operations.
  • Fig. 5 shows a logical structure diagram of a fault management system according to an embodiment of the present application.
  • the fault management system (Fault Management) in Fig. 5 is configured with: a fault controller (Fault Controller), a static signal detection module (Static Signal Monitor), and a fault injection module (Fault Injector) as shown in Fig. 4.
  • a fault controller Fault Controller
  • static signal detection module Static Signal Monitor
  • Fault Injector fault injection module
  • the static signal detection module (Static Signal Monitor) is responsible for real-time monitoring of the static signal generated by the system configuration module (System Configure) inside the chip according to the pre-configuration, and detects the signal fixed fault (Stuck-at Fault). ) Caused by the failure.
  • the stuck-at fault (Stuck-at Fault) is a stuck-at 0 or stuck-at 1 type fault known in the art, which means that a signal or pin in a circuit is unexpectedly fixed to a logic 0 (stuck-at 0). ) Or logic 1 (stuck-at 1), which cannot be changed.
  • the fault controller can be configured with a fault classification management model using the four-level fault classification management model (F4CM) designed in this application.
  • the fault management system for the functional safety of car-level chips provided by this application can ensure that the system software accurately locates and responds to various faults through a fine-grained fault classification system, effectively and timely Take reasonable fault response measures to improve the availability of the system when a fault occurs; at the same time, reduce the system software fault detection load, which is conducive to fast, high coverage, and individually configurable power-on and power-off of the chip. (Power-down) self-check.
  • Table 2 The corresponding relationship between the functional effects and technical means of the fault management system provided by the embodiments of the present application can be referred to in Table 2 below.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提供一种面向车规级芯片功能安全的故障管理系统,包括:芯片外部系统和车规级芯片,车规级芯片包括处理器、系统控制器、系统配置模块、故障管理器、芯片内功能模块,故障管理器配置有故障分类管理模型。

Description

面向车规级芯片功能安全的故障管理系统
本申请要求于2020年2月20日递交的中国专利申请第202010103727.8号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本申请涉及一种乘用车系统故障管理系统,特别涉及一种面向车规级芯片功能安全的系统故障管理系统。
背景技术
功能安全(Functional Safety)对于汽车领域中与安全相关的电子电气系统(诸如,动力控制系统)至关重要。这些功能安全(Functional Safety)的应用可以对系统施加严格的约束以在复杂的系统环境下安全且可靠地执行。
车规级芯片内部会集成众多安全机制(Safety Mechanism),该安全机制可以包括IP(芯片内部某个设计好的模块)内部的安全机制以及系统层面的安全机制。但是,当前的车规级芯片在故障识别、分类以及处理等方面具有很大的负荷,而且不能有效、及时地采取合理的故障响应措施,从而降低了系统在故障发生时的可用性。
发明内容
有鉴于此,本申请的提供一种面向车规级芯片功能安全的故障管理系统,该故障管理系统能够通过集中化、层次化、细粒度的芯片功能故障管理体系,可以有效地对芯片内部的故障根据严重程度进行检测以及分类,从而给系统提供精确的故障信息,确保系统软件准确定位并响应各种故障,降低系统软件故障检测负荷,有效、及时地采取合理的故障响应措施,提高系统在故障发生时的可用性。
本申请第一方面提供一种面向车规级芯片功能安全的故障管理系统,该故障管理系统包括芯片外部系统(out of chip)和车规级芯片,车规级芯片进一步包括:处理器(CPU)、系统控制器(System Controller)、系统配置模块(System Configure)、故障管理器(Fault Management)、芯片内功能模块(IP1……IPn);故障管理器(Fault Management)配置有故障分类管理模型。
在本申请第一方面中,进一步地,故障管理器(Fault Management)进一步包括故障注入模块(Fault Injector)、静态信号检测模块(Static Signal Monitor)以及故障控制模块(Fault Controller)。
故障注入模块(Fault Injector)通过电连接方式接入芯片内部所有功能模块(IP1……IPn),各功能模块(IP1……IPn)内配置有安全机制。
故障控制模块(Fault Controller)通过电连接方式分别接入各IP(IP1……IPn)、静态信号检测模块(Static Signal Monitor)、处理器(CPU)、系统控制器(System Controller)、芯片外部系统(out of chip)。
静态信号检测模块(Static Signal Monitor)通过电连接方式接入芯片内部的系统配置模块(System Configure)。
在本申请第一方面中,进一步地,故障注入模块(Fault Injector)通过错误测试信号对所有功能模块(IP1……IPn)或者系统的安全机制进行故障注入,检测相应的故障指示信号,并判断安全机制本身是否失效。
在本申请第一方面中,进一步地,故障控制器(Fault Controller)负责汇总自身的静态信号检测模块(Static Signal Monitor)、芯片内部各个IP以及芯片系统中所有安全机制所送出的故障指示信号(Fault Indicated Signals)所送出的故障指示信号(Fault Indicated Signals)。
在本申请第一方面中,进一步地,静态信号检测模块(Static Signal Monitor)对芯片内部的系统配置模块(System Configure)所产生的静态信号进行实时监测,避免由信号固定故障(Stuck-at Fault)所导致的失效。
在本申请第一方面中,进一步地,静态信号检测模块(Static Signal Monitor)所产生的故障指示信号输出到故障控制器(Fault Controller)进行分类处理。
本申请第二方面还提供一种面向车规级芯片功能安全的故障管理器(Fault Management),该故障管理器包括故障注入模块(Fault Injector)、静态信号检测模块(Static Signal Monitor)以及故障控制模块(Fault Controller)。
故障注入模块(Fault Injector)通过电连接方式接入芯片内部所有功能模块(IP1……IPn),各功能模块(IP1……IPn)内配置有安全机制。
故障控制模块(Fault Controller)通过电连接方式分别接入各IP(IP1……IPn)、静态信号检测模块(Static Signal Monitor)、处理器(CPU)、系统控制器(System Controller)、芯片外部系统(out of chip),故障控制模块(Fault Controller)内置有四种类型故障构成的故障分类管理模型。
静态信号检测模块(Static Signal Monitor)通过电连接方式接入芯片内部的系统配置模块(System Configure)。
在本申请第二方面中,进一步地,四种类型故障被配置为:类型1:将需要外部系 统协助处理的故障配置为致命故障(Fail Fatal);类型2:将主要功能失效的故障配置为故障安全(Fail Safe);类型3:将自动降级运行处理的故障配置为故障运行(Fail Operational);类型4:将自动纠错运行处理的故障配置为可纠错故障(Fail Correctable)。
在本申请第二方面中,进一步地,四种类型故障严重度(Severity Level)被配置为:规则1:类型1>主类型2>{类型3,类型4},其中“{类型3,类型4}”表示类型3和类型4的合集;规则2:类型3>类型4;规则3:规则1>规则2。
在本申请第二方面中,进一步地,故障控制器(Fault Controller)根据芯片所应用的不同场景以及故障类型按照预先配置产生四种类型故障构成的四层级结构的故障信息。
在本申请第二方面中,进一步地,故障控制器(Fault Controller)还包括4个故障选择单元(Fault Selection),产生的故障信息与输入的故障指示信号之间可以通过对故障选择单元(Fault Selection)的配置形成多种对应关系。
在本申请第二方面中,进一步地,多种对应关系包括:一对一(1 to 1)、一对多(1 to N)和/或多对一(N to 1),以适应不同的应用场景以及不同的功能安全等级要求。
本申请提供的面向车规级芯片功能安全的系统故障管理系统,能够通过细粒度的故障分类体系,确保系统软件准确定位并响应各种故障,有效、及时地采取合理的故障响应措施,提高系统在故障发生时的可用性;同时,降低系统软件故障检测负荷,有利于芯片实现快速、高覆盖率、可个性化配置的上电(Power-on)、下电(Power-down)自检。
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
图1示出了本申请一实施方式中根据芯片功能故障严重度(Severity Level)所设计的四层级故障分类管理模型示意图;
图2示出了根据本申请一实施方式的四层级故障分类管理模型(F4CM)的逻辑应用流程图;
图3示出了根据本申请另一实施方式的四层级故障分类管理模型(F4CM)的逻辑应用流程图;
图4示出了根据本申请一实施方式的故障控制器(Fault Controller)的逻辑结构图;
图5示出了根据本申请一实施方式的面向车规级芯片功能安全的故障管理系统 (Fault Management)的逻辑结构图。
具体实施方式
下面详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。
本技术领域技术人员可以理解的是,本申请中提到的相关模块是用于执行本申请中所述操作、方法、流程中的步骤、措施、方案中的一项或多项的硬件设备。所述硬件设备可以为所需的目的而专门设计和制造,或者也可以采用通用计算机中的已知设备或已知的其他硬件设备。所述通用计算机有存储在其内的程序选择性地激活或重构。
本技术领域技术人员可以理解的是,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的任一单元和全部组合。
本技术领域技术人员可以理解的是,除非另外定义,这里使用的所有术语(包括技术术语和科学术语)具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样定义,不会用理想化或过于正式的含义来解释。
汽车功能安全(Functional Safety)设计普遍遵循ISO(国际标准化组织)26262标准(针对汽车,2011年第一次发布,2018年发布第二版),其是从电子、电气及可编程器件功能安全基本标准IEC(国际电工委员会)61508(1998年第一次发布,2010年发布最新版本)派生出来的,主要定位在汽车行业中特定的电气器件、电子设备、可编程电子器件等专门用于汽车领域的部件,旨在提高汽车电子、电气产品功能安全的国际标准。
ISO 26262标准通过危害分析与风险评估(Hazard Analysis and Risk Assessment,简称为HARA)及V模型设计架构,使功能安全的需求等级得到一致性的分析结果,借由设计开发、查证(Verification)及确认(Validation)等能力成熟度模型集成流程加以实现,并根据安全风险程度对系统或系统某组成部分划分所需汽车安全完整性等级(Automotive Safety Integration Level,简称为ASIL),使得产品功能安全符合汽车安全要求。ASIL有四个等级, 由低到高依次分别为A、B、C和D,其中A是最低的等级,D是最高的等级。针对每种危害确定至少一个安全目标,安全目标是系统的最高级别的安全需求,由安全目标导出系统级别的安全需求,再将安全需求分配到硬件和软件。ASIL等级决定了对系统安全性的要求,ASIL等级越高,对系统的安全性要求越高,为实现安全付出的代价越高,意味着硬件的诊断覆盖率越高,开发流程越严格,相应的开发成本增加、开发周期延长,技术要求严格。例如,ISO 26262功能安全(Functional Safety)标准要求单点故障度量指标(Single-Point Fault Metric,简称为SPFM)大于或等于99%以实现最高的安全完整性水平ASIL D。因此,满足功能安全对于实时系统可以是复杂且困难的。
为了满足ASIL要求,车规级芯片内部会集成众多安全机制(Safety Mechanism),该安全机制可以包括IP(芯片内部某个设计好的模块)内部的安全机制以及系统层面的安全机制。当故障发生并被相应安全机制检测到时,这些安全机制需要及时报告故障的发生,以便系统根据故障类型以及程度做出相应的故障响应,从而避免故障的潜藏或者故障所直接带来的功能失效。
然而,当前的具有功能安全要求的车规级芯片的设计通常会存在一些问题,具体如下。
例如,在芯片内部缺少集中化的故障管理模块的情况,给系统软件的故障识别、分类以及处理都带来了很大的负荷,也不利于芯片实现快速、高覆盖率、可个性化配置的上电(Power-on)、下电(Power-down)自检。
例如,在芯片内部集成故障管理模块的情况中,对故障进行了分类,但是分类粒度很大(故障分为两类:致命(Fatal)和错误(Error)),造成系统不能有效、及时地采取合理的故障响应措施,从而降低了系统在故障发生时的可用性。
因此,需要优化现有的车规级芯片功能安全系统故障管理系统,以有效地解决上面提到的两类问题。
本申请的实施例提供一种面向车规级芯片功能安全的故障管理系统,该故障管理系统包括芯片外部系统(out of chip)和车规级芯片,车规级芯片包括故障管理器(Fault Management)。故障管理器(Fault Management)配置有故障分类管理模型。在该面向车规级芯片功能安全的系统故障管理系统中,利用设置有故障分类管理模型的故障管理器,能够通过细粒度的故障分类体系,确保系统软件准确定位并响应各种故障,从而有效、及时地采取合理的故障响应措施,提高系统在故障发生时的可用性。
例如,在本申请的实施例中,车规级芯片还可以包括处理器(CPU)、系统控制器(System Controller)、系统配置模块(System Configure)、芯片内功能模块(IP1……IPn)等。
下面,结合附图对根据本申请至少一个实施例中的面向车规级芯片功能安全的故障管理 系统进行详细的说明。
需要说明的是,在本申请的一些实施例中,“应用场景”是指芯片(车规级芯片)所应用的汽车内的应用场景,主要涉及汽车内不同系统或部件所构成的环境。车规级芯片会集成IP内部的安全机制以及系统层面的安全机制,当故障发生并被相应安全机制检测到时,这些安全机制需要及时报告故障的发生,以便系统根据故障类型以及程度做出相应的故障响应,从而避免故障的潜藏或者故障所直接带来的功能失效。
在本申请的实施例中,芯片内部硬件的随机故障,可以按照如下维度(W1至W3)进行区分:
W1外部协助:故障发生后,是否需要外部系统协助处理故障?
W2主要功能:故障发生后,芯片内部硬件或者运行于芯片上的软件系统的主要功能是否失效?
W3自行处理:故障发生后,芯片内部硬件或者运行于芯片上的软件系统的主要功能是否能够自行处理?该维度下,又可细分为:降级运行、自动纠错。
基于上述分析结果,在本申请的实施例中,进行如下定义(定义1至定义4)。
定义1,将需要外部系统协助处理的故障定义为“致命故障(Fail Fatal)”;
定义2,将主要功能失效的故障定义为“故障安全(Fail Safe)”;
定义3,将自动降级运行处理的故障定义为“故障运行(Fail Operational)”;
定义4,将自动纠错运行处理的故障定义为“可纠错故障(Fail Correctable)”。
根据上述维度逻辑和理论,在本申请至少一个实施例中,建立如下故障分类管理体系,详见下表一。
表一:故障分类管理体系
Figure PCTCN2021076492-appb-000001
Figure PCTCN2021076492-appb-000002
例如,在本申请的实施例中,车规级芯片内所有的功能模块(IP1……IPn)的故障可以分成表一所述的四类(故障等级1-4依次对应类型1-4)。表一可用于工程实践上,将芯片内部硬件随机故障进行分类标注,以便系统自动判断故障类型并精确定位故障位置。
在本申请的实施例中,根据本领域工程实践可知,按照芯片功能故障的严重度(Severity Level)分析,具有如下规则逻辑(规则1至规则3)。
规则1,外部协助(类型1)>主要功能丧失(类型2)>自行处理{类型3,类型4},其中“{类型3,类型4}”表示类型3和类型4的合集。
规则2,降级运行(类型3)>自动纠错(类型4)。
规则3,规则1>规则2。
在规则3中,类型1>类型2>类型3,以及类型1>类型2>类型4。
与当前的满足ASIL标准的芯片功能故障分类模型相比,本申请的实施例提出的故障分类具有至少如下主要优点(优点1至优点5)。
优点1,集中化的故障分类体系。芯片功能故障的各种情况都可以涵盖在这四种类型里,使得后续故障处理可以根据不同类型进行快速响应,提高故障处理响应效率。
优点2,细粒度的故障分类体系。将故障分类由当前常见的致命(Fatal)和错误(Error)两类故障细化为上述四类(类型1至类型4),提高了分类颗粒度,软件或者硬件可以直接进行相应的处理,提高了故障的响应速度。
优点3,层次化的故障分类体系。故障分类的四个等级与功能安全的要求(例如前述的A、B、C和D四个等级)契合度高,有利于做功能安全相关的系统开发。
优点4,降低系统软件故障检测负荷。分类颗粒度变细使得软件或者硬件可以直接进行相应的处理,提高了故障的响应速度,故障分类直接由硬件完成,减少了软件的负担。
优点5,可个性化配置使用场景。故障的分类方式可以进行个性化配置,以满足不同的应用场景,提高芯片适用灵活性。
图2示出了根据本申请一实施方式的四层级故障分类管理模型(F4CM)的逻辑应用流程图。
在本申请一些实施例中,如图2所示,故障管理器可以执行如下的步骤S2-1至S2-4。
在步骤S2-1中,检测到芯片内部某个IP发生的功能故障,即接收到安全机制所送出的故障指示信号(Fault Indicated Signals)。
在步骤S2-2中,根据四层级故障分类管理模型(F4CM),判断该IP功能故障发生后,是否需要外部系统协助处理故障?如果判断结果为“是”,则确定为致命故障(Fail Fatal), 将该IP功能故障信号(致命故障,Fail Fatal)信息输出到芯片外部(out of chip),由外部系统协助进行复位、断电或其他必要操作;如果判断结果为“否”,则根据四层级故障分类管理模型(F4CM)进行下一判断步骤,即,判断故障发生后,芯片内部硬件或者运行于芯片上的软件系统的主要功能是否失效?
在步骤S2-3中,如果判断结果为“是”,则确定为故障安全(Fail Safe),将该IP功能故障信号(故障安全,Fail Safe)信息输出到芯片内部的系统控制器(System Controller)进行自动复位等必要操作来使系统进入安全状态或者恢复运行;如果判断结果为“否”,则根据四层级故障分类管理模型(F4CM)进行下一判断步骤,即,判断故障发生后,芯片内部硬件或者运行于芯片上的软件系统的主要功能是否需要降级运行?
在步骤S2-4中,如果判断结果为“是”,则确定为故障运行(Fail Operational),将该IP功能故障信号(故障运行,Fail Operational)的信息输出到芯片内部的处理器(CPU)交由运行于CPU上的软件进行降级运行处理;如果判断结果为“否”,则确定为可纠错故障(Fail Correctable),将该IP功能故障信号(纠错故障,Fail Correctable)的信息输出到芯片内部的处理器(CPU)交由运行于CPU上的软件通过安全机制进行自动纠错处理或者由该IP内的安全机制进行自行纠错。
例如,在本申请的实施例中,根据故障管理体系的四个等级,由低到高依次判断故障应该划分为哪个等级,在执行时,在按照由低到高的顺序对故障进行处理。如此,可以加快对相对严重的故障的处理进程,缩短故障处理的反应时间。需要说明的是,故障等级的高和低的划分标准是基于上述表一所呈现的数字大小,即,最高的故障等级为数字4所代表的可纠错故障,最低的故障等级为数字所代表的致命故障,故障等级的数字编号越小,故障严重程度越大。
图3示出了根据本申请另一实施方式的四层级故障分类管理模型(F4CM)的逻辑应用流程图。
在本申请另一些实施例中,故障管理器还可以包括分类器,该分类器用于接收芯片内部的各个功能模块发生的功能故障的信号,并判断该功能故障的类型。利用分类器预先判断功能故障的类型,可以减少逻辑判断的步骤,简化计算,提高处理效率。示例性的,如图3所示,包括分类器的故障管理器可以执行如下的步骤S3-1至S3-3,其中,图3中实施例与图2中实施例不同之处在于,图3中实施例对四层级故障的判断逻辑发生了变化,采用分类器接收芯片内部IP1……IPn发生的功能故障信号,根据4种不同类型的故障属性同时判断该功能故障属于哪一类故障。四层级故障分类管理模型(F4CM)配置在分类器中。
在步骤S3-1中,检测到芯片内部某个IP发生的功能故障,即接收到安全机制所送出的 故障指示信号(Fault Indicated Signals)。
在步骤S3-2中,根据四层级故障分类管理模型(F4CM),判断IP发生的功能故障类型属于致命故障(Fail Fatal)、故障安全(Fail Safe)、故障运行(Fail Operational)、可纠错故障(Fail Correctable)四种类型中的哪一类。
在步骤S3-3中,当功能故障类型属于致命故障(Fail Fatal)时,将该IP功能故障信号(致命故障,Fail Fatal)信息输出到芯片外部系统(out of chip),由外部系统协助进行复位、断电或其他必要操作。
在步骤S3-3中,当功能故障类型属于致命故障故障安全(Fail Safe)时,将该IP功能故障信号(故障安全,Fail Safe)信息输出到芯片内部的系统控制器(System Controller)进行自动复位等必要操作来使系统进入安全状态或者恢复运行。
在步骤S3-3中,当功能故障类型属于致命故障故障运行(Fail Operational)时,将该IP功能故障信号(故障运行,Fail Operational)的信息输出到芯片内部的处理器(CPU)交由运行于CPU上的软件进行降级运行处理。
在步骤S3-3中,当功能故障类型属于致命故障可纠错故障(Fail Correctable)时,将该IP功能故障信号(纠错故障,Fail Correctable)的信息输出到芯片内部的处理器(CPU)交由运行于CPU上的软件通过安全机制进行自动纠错处理或由该IP内的安全机制进行自动纠错。
例如,在本申请至少一个实施例中,在故障管理器包括分类器的情况下,分类器可以是根据四层级故障分类管理模型(F4CM)的逻辑应用流程编写的软件代码程序。因此,分类器的设计不需要增加芯片或其他硬件的相关应用成本。
根据以上描述,本申请的四层级故障分类管理模型(F4CM)的逻辑应用实施例是低成本、高效率的面向车规级芯片功能安全的系统故障管理系统,能够通过集中化、层次化、细粒度的芯片功能故障管理体系,可以有效地对芯片内部的故障根据严重程度进行检测以及分类,从而给系统提供精确的故障信息,确保系统软件准确定位并响应各种故障,降低系统软件故障检测负荷,有效、及时地采取合理的故障响应措施,提高系统在故障发生时的可用性。
图4示出了根据本申请一实施方式的故障控制器(Fault Controller)的逻辑结构图。图4中的故障控制器(Fault Controller)的逻辑结构是根据图3中四层级故障分类管理模型(F4CM)逻辑应用流程进行设计得出的。
例如,在本申请至少一个实施例提供中,故障控制器(Fault Controller)负责汇总芯片内部各个IP(IP1……IPn)以及芯片系统中所有安全机制所送出的故障指示信号(Fault Indicated Signals),并根据芯片所应用的不同场景以及故障类型按照预先配置产生对应图1 所示四层级故障分类管理模型(F4CM)的故障信息。
例如,在本申请至少一个实施例提供中,故障控制器(Fault Controller)可以进一步用于负责汇总自身的静态信号检测模块(Static Signal Monitor)、芯片内部各个IP以及芯片系统中所有安全机制所送出的故障指示信号(Fault Indicated Signals)所送出的故障指示信号(Fault Indicated Signals)。
例如,在本申请至少一个实施例提供中,故障控制器(Fault Controller)可以包括4个故障选择单元(Fault Selection)。产生的故障信息与输入的故障指示信号之间可以通过对故障选择单元(Fault Selection)的配置形成多种对应关系。如图4所示,多种对应关系包括:一对一(1 to 1)、一对多(1 to N)和/或多对一(N to 1),N为不小于2的正整数。如此,拥有该实施例中的控制器的故障管理系统可以适应不同的应用场景以及不同的功能安全等级要求。
如图4所示,作为一种连接关系的实施例,故障控制器(Fault Controller)内设置有4个故障选择单元(Fault Selection),该4个故障选择单元分别对应致命故障(Fail Fatal)、故障安全(Fail Safe)、故障运行(Fail Operational)、可纠错故障(Fail Correctable)四种类型故障,并用于分别选择性接收芯片内部各个IP(IP1……IPn)发送故障指示信号(Fault Indicated Signals)。芯片内部各个IP(IP1……IPn)通过电信号方式分别接入故障选择单元(Fault Selection),使得故障选择单元(Fault Selection)能够接收到芯片内部各个IP发送故障指示信号(Fault Indicated Signals)。
在该实施例中,如图4所示,对于每一个故障选择单元(例如故障选择单元1),其与多个功能模块IP1~IPn信号连接以建立对应关系,在该情况下,该对应关系为上述的多对一;对于每个功能模块(例如IP1),其与多个故障选择单元1~4信号连接以建立对应关系,在该情况下,该对应关系为上述的一对多;此外,一个故障选择单元(例如故障选择单元1)与一个功能模块(例如IP1)之间信号连接所建立的对应关系为上述的一对一。需要说明的是,在本申请的实施例中,一对一、一对多、多对一的对应关系可以独立存在也可以为如图4所示的共同存在,具体可以根据实际需要进行设计,在此不做限制。
例如,在本申请至少一个实施例中,还可以在故障控制器(Fault Controller)外部设置软件配置模块(Software Configuration)。软件配置模块(Software Configuration)通过电信号方式分别接入4个故障选择单元(Fault Selection),根据芯片所应用的不同场景以及故障类型进行预先配置,使得故障选择单元能够接收到芯片内部各个IP发送故障指示信号(Fault Indicated Signals)。软件配置模块(Software Configuration)还可用于实时监测故障选择单元(Fault Selection)的工作状态,当故障选择单元(Fault Selection)出现故障或者逻辑错误时, 可以及时进行外部监控和纠正。经过软件配置模块(Software Configuration)采集和判断故障指示信号(Fault Indicated Signals)后,生成故障信息(Fault Information)。
在运行时,产生的故障信息(Fault Information)可以送给芯片内部模块以及外部(外部系统,例如软件配置模块等)进行如下处理:1)将故障运行(Fail Operational)以及可纠错故障(Fail Correctable)的信息输出到芯片内部的处理器(CPU)交由运行于CPU上的软件进行处理;2)将故障安全(Fail Safe)信息输出到芯片内部的系统控制器(System Controller)进行自动复位等必要操作来使系统进入安全状态或者恢复运行;3)将致命故障(Fail Fatal)信息输出到芯片外部(out of chip),由外部系统协助进行复位、断电或其他必要操作。
图5示出了根据本申请一实施方式的故障管理系统的逻辑结构图。图5中的故障管理系统(Fault Management)配置有:如图4所示的故障控制器(Fault Controller)、静态信号检测模块(Static Signal Monitor)以及故障注入模块(Fault Injector)。故障控制器(Fault Controller)的具体结构、功能、逻辑流程如前面的实施例所述,本处不再赘述。
下面,将分别详细描述静态信号检测模块(Static Signal Monitor)、故障注入模块(Fault Injector)以及故障管理系统(Fault Management)的结构、功能、逻辑流程。
如图5所示,静态信号检测模块(Static Signal Monitor)负责根据预先配置,对芯片内部的系统配置模块(System Configure)所产生的静态信号进行实时监测,检测由信号固定故障(Stuck-at Fault)所导致的失效。例如,所述固定故障(Stuck-at Fault)是本领域公知的stuck-at 0或者stuck-at 1类型故障,是指电路中信号或者管脚非预期地被固定在逻辑0(stuck-at 0)或者逻辑1(stuck-at 1)上,而无法改变的一类故障,具体参见网址为http://web.stanford.edu/class/ee386/public/stuck_at_fault_6per_page中的内容。静态信号检测模块所产生的故障指示信号也会输出到故障控制器(Fault Controller)进行分类、处理。
如图5所示,功能安全除了要求对功能电路可能产生的故障设计安全机制进行监控,还要求对安全机制本身进行检测以避免潜在故障(Latent Fault)的发生。故障注入模块(Fault Injector)通过错误测试信号(Error Injection Signals)对IP或者系统的安全机制进行故障注入,并检测相应的故障指示信号,从而判断安全机制本身是否失效。故障注入功能分为硬件自动故障注入和软件可控故障注入两类:1)硬件自动故障注入功能可以应用于芯片上电(Power-on)的过程中,此时CPU的软件并没有启动,硬件的故障自动注入及检测可以保证系统启动后运行在一个安全的环境下;2)软件可控故障注入功能可以应用于芯片上电(Power-on)、下电(Power-down)或者运行过程中,此时系统可以针对芯片的应用场景以及故障容忍时间间隔(FTTI)对不同的安全机制采用不同的故障注入策略,从而提高了芯片的应用灵活性。
如图5所示,本申请的实施例设计一种故障管理器(Fault Management),该故障管理器可以包括故障注入模块(Fault Injector)、静态信号检测模块(Static Signal Monitor)以及故障控制器(Fault Controller)。例如,故障注入模块(Fault Injector)可以通过电连接方式接入芯片内部各个IP(IP1……IPn),各IP(IP1……IPn)内配置有安全机制(Safety Mechanism),故障注入模块(Fault Injector)通过故障注入信号(Fault Injection Signals)对IP或者系统的安全机制进行故障注入,并检测相应的故障指示信号,从而判断安全机制本身是否失效。例如,故障控制器(Fault Controller)通过电连接方式接入各IP(IP1……IPn)、静态信号检测模块(Static Signal Monitor)、处理器(CPU)、系统控制器(System Controller)、芯片外部系统(out of chip)。例如,故障控制器(Fault Controller)内配置有故障分类管理模型;静态信号检测模块(Static Signal Monitor)通过电连接方式接入芯片内部的系统配置模块(System Configure),用于接收系统配置模块(System Configure)所产生的静态信号(Static Signals)并进行实时监测,检测由信号固定故障(stuck-at 0或者stuck-at 1)所导致的失效。
在本申请至少一个实施例中,故障控制器(Fault Controller)可以内配置故障分类管理模型采用本申请设计的四层级故障分类管理模型(F4CM)。
在本申请至少一个实施例中,四层级故障分类管理模型(F4CM)可以设计为4个故障选择单元(Fault Selection),分别对应致命故障(Fail Fatal)、故障安全(Fail Safe)、故障运行(Fail Operational)、可纠错故障(Fail Correctable)四种类型故障,用于分别选择性接收芯片内部各个IP(IP1……IPn)发送故障指示信号(Fault Indicated Signals)。
根据前面的实施例,本申请提供的面向车规级芯片功能安全的故障管理系统(Fault Management),能够通过细粒度的故障分类体系,确保系统软件准确定位并响应各种故障,有效、及时地采取合理的故障响应措施,提高系统在故障发生时的可用性;同时,降低系统软件故障检测负荷,有利于芯片实现快速、高覆盖率、可个性化配置的上电(Power-on)、下电(Power-down)自检。本申请的实施例提供的故障管理系统的功能效果和技术手段的对应关系可以参加下表二。
表二:功能效果与技术手段对应关系
Figure PCTCN2021076492-appb-000003
Figure PCTCN2021076492-appb-000004
以上所述仅是本申请的多个优选实施方式,文字部分括号内的字母和附图部分图示中的字母仅仅表示该模块或步骤的名称符号,具体含义请以实施例描述和中文含义为准。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (20)

  1. 一种面向车规级芯片功能安全的故障管理系统,其特征在于,包括芯片外部系统和车规级芯片,其中,
    所述车规级芯片包括故障管理器,所述故障管理器配置有故障分类管理模型。
  2. 如权利要求1所述的面向车规级芯片功能安全的故障管理系统,其特征在于,
    所述故障管理器内置有由故障等级由高到低划分的四种类型故障构成的所述故障分类管理模型。
  3. 如权利要求1或2所述的面向车规级芯片功能安全的故障管理系统,其特征在于,所述四种类型故障被配置为:
    类型1:将需要所述芯片外部系统协助处理的故障配置为致命故障;
    类型2:将主要功能失效的故障配置为故障安全;
    类型3:将自动降级运行处理的故障配置为故障运行;以及
    类型4:将自动纠错运行处理的故障配置为可纠错故障。
  4. 如权利要求3所述的面向车规级芯片功能安全的故障管理系统,其特征在于,所述四种类型故障被被进一步配置为:
    规则1:类型1>类型2>{类型3,类型4},其中“{类型3,类型4}”表示类型3和类型4的合集;
    规则2:类型3>类型4;以及
    规则3:规则1>规则2。
  5. 如权利要求3或4所述的面向车规级芯片功能安全的故障管理系统,其特征在于,
    所述车规级芯片包括处理器、系统控制器、系统配置模块和位于所述车规级芯片内的至少一个功能模块。
  6. 如权利要求5所述的面向车规级芯片功能安全的故障管理系统,其特征在于,所述故障管理器进一步包括故障注入模块、静态信号检测模块以及故障控制模块,其中,
    所述故障注入模块通过电连接方式接入位于所述芯片内部的所述至少一个功能模块的每个功能模块,每个所述功能模块内配置有安全机制;
    所述故障控制模块通过电连接方式分别接入每个所述功能模块、静态信号检测模块、处理器、系统控制器、芯片外部系统,所述故障控制模块内置有所述故障分类管理模型;以及
    所述静态信号检测模块通过电连接方式接入位于所述芯片内部的所述系统配置模块。
  7. 如权利要求6所述的面向车规级芯片功能安全的故障管理系统,其特征在于,
    所述故障注入模块通过故障注入信号对所述安全机制进行故障注入,检测相应的故障指示信号,并判断所述安全机制本身是否失效。
  8. 如权利要求6或7所述的面向车规级芯片功能安全的故障管理系统,其特征在于,
    所述故障控制模块负责汇总自身的静态信号检测模块、所述安全机制所送出的故障指示信号。
  9. 如权利要求8所述的面向车规级芯片功能安全的故障管理系统,其特征在于,所述故障控制模块将产生的故障信息发送给所述功能模块或所述芯片外部系统,包括:
    将分类为所述故障运行以及所述可纠错故障的信息输出到所述处理器并进行处理;
    将分类为所述故障安全的信息输出到所述系统控制器进行自动复位以使系统进入安全状态或者恢复运行;以及
    将分类为所述致命故障的信息输出到所述芯片外部系统,由所述芯片外部系统协助进行复位、断电操作。
  10. 如权利要求9所述的面向车规级芯片功能安全的故障管理系统,其特征在于,所述故障管理器执行步骤包括:
    步骤S2-1,接收到安全机制所送出的故障指示信号;
    步骤S2-2,判断是否需要所述芯片外部系统协助处理故障,包括:
    如果判断结果为“是”,则确定为所述致命故障,由所述芯片外部系统协助进行复位、断电操作;
    如果判断结果为“否”,执行步骤S2-3;
    步骤S2-3,判断所述芯片内部的硬件或者运行于所述芯片上的软件系统的主要功能是否失效,包括:
    如果判断结果为“是”,则确定为所述故障安全,将所述故障指示信号输出到所述系统控制器进行自动复位操作来使所述硬件或者所述软件系统进入安全状态或者恢复运行;
    如果判断结果为“否”,执行步骤S2-4;
    步骤S2-4,判断所述硬件或者所述软件系统的主要功能是否需要降级运行,包括:
    如果判断结果为“是”,则确定为所述故障运行,将所述故障指示信号输出到所述 处理器以进行降级运行处理;
    如果判断结果为“否”,则确定为所述可纠错故障,将所述故障指示信号输出到所述处理器以通过所述安全机制进行自动纠错处理。
  11. 如权利要求6-10中任一项所述的面向车规级芯片功能安全的故障管理系统,其特征在于,
    所述静态信号检测模块对位于所述芯片内部的所述系统配置模块所产生的静态信号进行实时监测,检测由信号固定故障所导致的失效。
  12. 如权利要求11所述的面向车规级芯片功能安全的故障管理系统,其特征在于,
    所述静态信号检测模块所产生的故障指示信号输出到所述故障控制模块并进行分类处理。
  13. 一种面向车规级芯片功能安全的故障管理器,所述故障管理器应用至故障管理系统,所述故障管理系统包括芯片外部系统和车规级芯片,其特征在于,所述故障管理器配置有故障分类管理模型。
  14. 如权利要求13所述的面向车规级芯片功能安全的故障管理器,其特征在于,所述故障控制模块内置有由故障等级由高到低划分的四种类型故障构成的故障分类管理模型。
  15. 如权利要求14所述的面向车规级芯片功能安全的故障管理器,其特征在于,所述四种类型故障被配置为:
    类型1:将需要所述芯片外部系统协助处理的故障配置为致命故障;
    类型2:将主要功能失效的故障配置为故障安全;
    类型3:将自动降级运行处理的故障配置为故障运行;以及
    类型4:将自动纠错运行处理的故障配置为可纠错故障。
  16. 如权利要求14所述的面向车规级芯片功能安全的故障管理器,其特征在于,所述四种类型故障被进一步配置为:
    规则1:类型1>类型2>{类型3,类型4},其中“{类型3,类型4}”表示类型3和类型4的合集;
    规则2:类型3>类型4;以及
    规则3:规则1>规则2。
  17. 如权利要求14-16中任一项所述的面向车规级芯片功能安全的故障管理器,其特征在于,所述故障管理器包括故障注入模块、静态信号检测模块以及故障控制模块,其中:
    所述故障注入模块通过电连接方式接入位于所述芯片内部的所述至少一个功能模块的每个功能模块,每个所述功能模块内配置有安全机制;
    所述故障控制模块通过电连接方式分别接入每个所述功能模块、静态信号检测模块、处理器、系统控制器、芯片外部系统,所述故障控制模块内置有所述故障分类管理模型;以及
    所述静态信号检测模块通过电连接方式接入位于所述芯片内部的所述系统配置模块。
  18. 如权利要求14-17中任一项所述的面向车规级芯片功能安全的故障管理器,其特征在于,所述故障控制模块根据所述芯片所应用的不同场景以及所述故障的类型产生故障信息。
  19. 如权利要求7所述的面向车规级芯片功能安全的故障管理器,其特征在于,故障注入模块产生故障指示信号以输入至所述故障控制模块,所述故障控制模块还包括4个故障选择单元,所述故障信息与所述故障指示信号之间可以通过对所述故障选择单元的配置形成多种对应关系。
  20. 如权利要求11所述的面向车规级芯片功能安全的故障管理器,其特征在于,所述多种对应关系包括:一对一、一对多和/或多对一,以适应不同的应用场景以及不同的功能安全等级要求。
PCT/CN2021/076492 2020-02-20 2021-02-10 面向车规级芯片功能安全的故障管理系统 WO2021164679A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/891,501 US20220392280A1 (en) 2020-02-20 2022-08-19 Fault management system for functional safety of automotive grade chip

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010103727.8A CN110955571B (zh) 2020-02-20 2020-02-20 面向车规级芯片功能安全的故障管理系统
CN202010103727.8 2020-02-20

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/891,501 Continuation US20220392280A1 (en) 2020-02-20 2022-08-19 Fault management system for functional safety of automotive grade chip

Publications (1)

Publication Number Publication Date
WO2021164679A1 true WO2021164679A1 (zh) 2021-08-26

Family

ID=69985704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076492 WO2021164679A1 (zh) 2020-02-20 2021-02-10 面向车规级芯片功能安全的故障管理系统

Country Status (3)

Country Link
US (1) US20220392280A1 (zh)
CN (1) CN110955571B (zh)
WO (1) WO2021164679A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955571B (zh) * 2020-02-20 2020-07-03 南京芯驰半导体科技有限公司 面向车规级芯片功能安全的故障管理系统
CN114968646A (zh) * 2022-07-27 2022-08-30 南京芯驰半导体科技有限公司 一种功能故障处理系统及其方法
CN115792583B (zh) * 2023-02-06 2023-05-12 中国第一汽车股份有限公司 一种车规级芯片的测试方法、装置、设备及介质
CN116501008B (zh) * 2023-03-31 2024-03-05 北京辉羲智能信息技术有限公司 一种面向自动驾驶控制芯片的故障管理系统
CN116681015B (zh) * 2023-08-03 2023-12-22 苏州国芯科技股份有限公司 一种芯片设计方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201583A1 (en) * 2013-01-15 2014-07-17 Scaleo Chip System and Method For Non-Intrusive Random Failure Emulation Within an Integrated Circuit
CN105365712A (zh) * 2015-11-05 2016-03-02 东风汽车公司 一种用于车身控制系统的功能安全电路及控制方法
CN109308367A (zh) * 2017-07-26 2019-02-05 台湾积体电路制造股份有限公司 对电子装置的安全电路进行仿真的方法
CN109709849A (zh) * 2018-12-20 2019-05-03 浙江吉利汽车研究院有限公司 单片机安全运行控制方法与装置
CN109709963A (zh) * 2018-12-29 2019-05-03 百度在线网络技术(北京)有限公司 无人驾驶控制器及无人驾驶车辆
CN110658807A (zh) * 2019-10-16 2020-01-07 上海仁童电子科技有限公司 一种车辆故障诊断方法、装置及系统
CN110955571A (zh) * 2020-02-20 2020-04-03 南京芯驰半导体科技有限公司 面向车规级芯片功能安全的故障管理系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360868B (zh) * 2014-11-29 2017-10-24 中国航空工业集团公司第六三一研究所 一种大型飞机综合处理平台中的多级故障管理方法
US10685159B2 (en) * 2018-06-27 2020-06-16 Intel Corporation Analog functional safety with anomaly detection
CN109484474B (zh) * 2018-09-19 2021-06-08 上海汽车工业(集团)总公司 Eps控制模块及其控制系统和控制方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201583A1 (en) * 2013-01-15 2014-07-17 Scaleo Chip System and Method For Non-Intrusive Random Failure Emulation Within an Integrated Circuit
CN105365712A (zh) * 2015-11-05 2016-03-02 东风汽车公司 一种用于车身控制系统的功能安全电路及控制方法
CN109308367A (zh) * 2017-07-26 2019-02-05 台湾积体电路制造股份有限公司 对电子装置的安全电路进行仿真的方法
CN109709849A (zh) * 2018-12-20 2019-05-03 浙江吉利汽车研究院有限公司 单片机安全运行控制方法与装置
CN109709963A (zh) * 2018-12-29 2019-05-03 百度在线网络技术(北京)有限公司 无人驾驶控制器及无人驾驶车辆
CN110658807A (zh) * 2019-10-16 2020-01-07 上海仁童电子科技有限公司 一种车辆故障诊断方法、装置及系统
CN110955571A (zh) * 2020-02-20 2020-04-03 南京芯驰半导体科技有限公司 面向车规级芯片功能安全的故障管理系统

Also Published As

Publication number Publication date
US20220392280A1 (en) 2022-12-08
CN110955571B (zh) 2020-07-03
CN110955571A (zh) 2020-04-03

Similar Documents

Publication Publication Date Title
WO2021164679A1 (zh) 面向车规级芯片功能安全的故障管理系统
US20180111626A1 (en) Method and device for handling safety critical errors
US8732522B2 (en) System on chip fault detection
US10649487B2 (en) Fail-safe clock monitor with fault injection
US11774487B2 (en) Electrical and logic isolation for systems on a chip
CN107193680A (zh) 一种心跳检测方法、设备及系统
CN116049249A (zh) 报错信息处理方法、装置、系统、设备和存储介质
US8255769B2 (en) Control apparatus and control method
CN114968646A (zh) 一种功能故障处理系统及其方法
US10467889B2 (en) Alarm handling circuitry and method of handling an alarm
CN108254670A (zh) 用于高速交换SoC的健康监控电路结构
JP7012915B2 (ja) コントローラ
US8478478B2 (en) Processor system and fault managing unit thereof
CN104050051B (zh) 一种星载计算机的故障诊断方法
US20210397502A1 (en) Method and system for fault collection and reaction in system-on-chip
CN107179911A (zh) 一种重启管理引擎的方法和设备
US9164852B2 (en) System on chip fault detection
CN110991673A (zh) 用于复杂系统的故障隔离和定位方法
JP5337661B2 (ja) メモリ制御装置及びメモリ制御装置の制御方法
CN111859843B (zh) 检测电路故障的方法及其装置
CN103391207B (zh) 异构的故障管理系统
CN109885450B (zh) 主动式星载计算机健康状态监视优化方法及系统
Pandya et al. Software Validation for Safety System based on IEC61508
JP5151216B2 (ja) 論理機能回路と自己診断回路とからなる統合回路の設計方法
CN111061243B (zh) 电子控制器程序流监控系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21757528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21757528

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21757528

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21757528

Country of ref document: EP

Kind code of ref document: A1