CN116775436A - Chip fault prediction method, device, computer equipment and storage medium - Google Patents

Chip fault prediction method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116775436A
CN116775436A CN202310789808.1A CN202310789808A CN116775436A CN 116775436 A CN116775436 A CN 116775436A CN 202310789808 A CN202310789808 A CN 202310789808A CN 116775436 A CN116775436 A CN 116775436A
Authority
CN
China
Prior art keywords
area
functional
chip
determining
shadow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310789808.1A
Other languages
Chinese (zh)
Inventor
孔庆宇
林湖
黄歆
韩超
徐康
白文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Automotive Innovation Corp
Original Assignee
China Automotive Innovation Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Automotive Innovation Corp filed Critical China Automotive Innovation Corp
Priority to CN202310789808.1A priority Critical patent/CN116775436A/en
Publication of CN116775436A publication Critical patent/CN116775436A/en
Pending legal-status Critical Current

Links

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The present application relates to the field of communications technologies, and in particular, to a method and apparatus for predicting a chip failure, a computer device, and a storage medium. The method comprises the following steps: acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested; determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; the test erasing times are larger than the data erasing times; based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area; and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area. The application can accurately predict the faults of the chip.

Description

Chip fault prediction method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method and apparatus for predicting a chip failure, a computer device, and a storage medium.
Background
With the development of unmanned operation, the complexity of software is higher and higher, such as log record, process data storage, program call, space download Technology (OTA) upgrade, etc., and accordingly, the Memory read/write operation of Flash (Flash Memory) chips is more and more frequent. Frequent erasure of FLASH chips will result in failure of single or multi-bit data bits or even BLOCK data BLOCKs. For the failure of the data bit of the FLASH chip, the FLASH chip controller is used for marking the bad block, so that the FLASH chip is prevented from being erased again. However, for the failed data bit or data block, the stored data cannot be ensured to be smoothly transferred, and a large data loss risk exists.
In the conventional technology, the fault monitoring method for the FLASH chip comprises the following steps: generally, the step method is performed in the starting stage, and whether the read-write abnormality exists or not is judged by a plurality of groups of read-write. In addition, in the running process, whether single bit or multi-bit failure exists is checked through read-write verification, and meanwhile, abnormality is monitored through a backup area coverage mode.
However, the above scheme can only generally alarm and process the fault of the FLASH chip, and it is difficult to predict the fault trend of the FLASH chip. Thus, improvements are needed.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a chip failure prediction method, apparatus, computer device, and storage medium that can accurately predict a failure of a chip.
In a first aspect, the present application provides a method for predicting a chip failure, the method comprising:
acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested;
determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; the test erasing times are larger than the data erasing times;
Based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area;
and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
In one embodiment, determining a shadow area corresponding to the functional area in the chip to be tested includes:
acquiring size information and position information of a functional area;
according to the preset scaling, the size information of the functional area is adjusted to obtain the size information of the shadow area corresponding to the functional area;
determining an associated area of the functional area in the chip to be tested according to the position information of the functional area; wherein, the data erasing performance of the related area is similar to the data erasing performance of the functional area;
and determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area.
In one embodiment, if the number of the functional areas is at least two, determining the associated area of the functional area in the chip to be tested according to the position information of the functional areas includes:
grouping the functional areas to obtain at least one functional group; each function group comprises two function areas which are in a main-standby relation with each other;
And determining the associated area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group.
In one embodiment, determining, in a chip to be tested, an associated area of each functional area in each functional group according to position information of each functional area in each functional group includes:
for each functional group, determining the associated area of each functional area in the functional group in the chip to be tested according to the position information of each functional area in the functional group;
determining the coincidence area between the associated areas of the functional areas in the functional group;
and updating the associated area of each functional area in the functional group by adopting the overlapped area.
In one embodiment, determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area includes:
determining the size information of the physical isolation corresponding to the chip to be tested according to the type of the chip to be tested;
determining a shadow area and a physical isolation area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area and the size information of the physical isolation corresponding to the chip to be tested; wherein the physical isolation region is located between the functional region and the shadow region.
In one embodiment, determining the failure prediction result of the functional area based on the erasure performance parameter corresponding to the shadow area includes:
obtaining a target fault prediction model corresponding to a chip to be tested;
and inputting the erasure performance parameters corresponding to the shadow area into the target fault prediction model to obtain a fault prediction result of the functional area.
In one embodiment, the method further comprises:
acquiring erasure performance parameters and fault prediction results corresponding to the sample chip; the sample chip and the chip to be tested are chips of the same model;
training an initial fault prediction model corresponding to the chip to be tested based on the erasure performance parameters and the fault prediction results corresponding to the sample chip to obtain a target fault prediction model corresponding to the chip to be tested.
In a second aspect, the present application further provides a chip failure prediction apparatus, including:
the acquisition module is used for acquiring a functional area for storing data in the chip to be tested and determining a shadow area corresponding to the functional area in the chip to be tested;
the analysis module is used for determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; the test erasing times are larger than the data erasing times;
The test module is used for carrying out data erasing test on the shadow area based on the test erasing times to obtain erasing performance parameters corresponding to the shadow area;
and the prediction module is used for determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested;
determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; the test erasing times are larger than the data erasing times;
based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area;
and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested;
determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; the test erasing times are larger than the data erasing times;
based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area;
and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
According to the chip fault prediction method, the device, the computer equipment and the storage medium, the shadow area corresponding to the functional area is determined in the chip to be tested, and the erasing test is carried out on the shadow area, so that the probability of fault occurrence of the functional area can be indirectly predicted on one hand, and the loss of the service life of the functional area caused by the test process is avoided; on the other hand, the test erasing times corresponding to the shadow area are made to be larger than the data erasing times (actual use condition of the functional area) of the functional area, so that the use condition of the shadow area is advanced than the use condition of the functional area, if the shadow area is damaged when the test erasing times are executed, the situation that the data erasing times of the functional area are possibly damaged when the test erasing times are reached can be judged, and the effect of carrying out fault prediction on any area (functional area) on the chip can be achieved.
Drawings
FIG. 1 is a diagram of an application environment for a method of chip failure prediction in one embodiment;
FIG. 2 is a flow chart of a method of predicting chip failure in one embodiment;
FIG. 3 is a schematic diagram of a shadow region in one embodiment;
FIG. 4 is a flowchart illustrating determining a shadow region corresponding to a functional region according to an embodiment;
FIG. 5 is a flow diagram of determining a physical isolation region in one embodiment;
FIG. 6 is a schematic diagram of the location of physically isolated regions in one embodiment;
FIG. 7 is a flow diagram of determining an associated region of a functional region in one embodiment;
FIG. 8 is a schematic diagram of an execution body as a CPU central computing layer in one embodiment;
FIG. 9 is a flow diagram of determining a failure prediction result in one embodiment;
FIG. 10 is a block diagram of a chip failure prediction apparatus in one embodiment;
FIG. 11 is an internal block diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The chip fault prediction method provided by the embodiment of the application can be applied to the scene of data processing, and particularly can be applied to the scene of chip fault detection. Alternatively, the method may be performed by a computer device, which may be a server, and the method may be performed in particular by a processor in the server, as shown in fig. 1, where the processor communicates with the respective chips to implement the detection of the chip failure.
In one embodiment, as shown in fig. 2, a chip failure prediction method is provided, which includes the following steps:
s201, acquiring a functional area for storing data in the chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested.
The functional areas are areas for storing data in the chip to be tested, and the number of the functional areas is at least one. Taking a chip to be tested as Flash as an example, flash is composed of a plurality of blocks (namely data blocks), and a corresponding functional area can comprise a plurality of data blocks.
It can be understood that the write operation of Flash must be performed in a blank area, and if the area to be written has data, the area to be written must be erased and then written, so that the erase operation is the basic operation of Flash; however, after the data block is erased for a plurality of times, the capacity of the data block for storing charges is gradually weakened, the data in the data block is exhausted by continuous erasing, errors are generated, and the data cannot be reliably used continuously, so that the data block becomes a bad block. In practical application, the occurrence condition of bad blocks is very complex and closely related to the specific use condition of each chip and each chip area in the chip; meanwhile, the occurrence of bad blocks is regional, and when the bad blocks occur in one region, the probability that adjacent blocks become bad blocks in the same region is also increased. Therefore, in order to predict the health degree of each data block in the functional area, the data erasing performance of the shadow area selected in the embodiment is similar to the data erasing performance of the functional area.
The erasing performance is one of important indexes of the reliability performance of the flash memory product; the data erasure performance in this embodiment can be characterized by the following two parameters: 1. the number of erase cycles that Flash can withstand, wherein 1 erase-write-read cycle is referred to as 1 PE cycle (Program Erase Cycle, number of erase cycles); 2. and erasing time.
In one implementation manner, in this embodiment, any number of data blocks except the functional area on the chip to be tested may be used as the shadow area, and since the shadow area and the functional area are both located on the chip to be tested, the data erasing performance of the shadow area is considered to be similar to the data erasing performance of the functional area, and if the shadow area fails the reliability test, the reliability of the chip to be tested is considered to be possibly faulty, so that the functional area is determined to be possibly faulty.
In another implementation manner, in this embodiment, an area satisfying the similarity condition except the functional area on the chip to be tested may be used as the shadow area. The similarity condition may be a similarity of the data block erasure records, and/or a degree of association in location. As shown in fig. 3, when there are two functional areas (functional area a 'and functional area B'), 2 shadow areas (shadow area a 'and shadow area B') are created correspondingly.
S202, determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period.
The target time period may be a time period between the use time and the current prediction time of the chip to be tested. The number of data erasures in the functional area refers to the average number of times each data block in the functional area is erased in a target period.
Optionally, in the process of writing data into and reading data from the functional area in the chip to be tested, the erasing cycle of each data block in the functional area can be detected to obtain the number of times of erasing the data in the functional area.
In this embodiment, the number of times F of erasing data corresponding to the functional area is increased according to a certain proportion to obtain the number of times F' of erasing test corresponding to the shadow area; the amplification ratio may be an empirical value. The purpose is to lead the service condition of the shadow area to be ahead of that of the functional area, and if the shadow area is damaged when the test erasing times are executed, the data erasing times of the functional area can be judged to be possibly damaged when the test erasing times are reached.
And S203, based on the test erasing times, performing data erasing test on the shadow area to obtain the erasing performance parameters corresponding to the shadow area.
Alternatively, the erasing performance parameter in this embodiment may be characterized by whether or not a bad block (bad block number), the erasing time T, and the erasing time increasing speed V are present. The erasing time T refers to the duration of the shadow area for completing the corresponding test erasing times; the speed of increase in the erase time refers to the average of the increment of each erase time over the last erase time.
Specifically, test data are obtained, based on the test erasing times and the corresponding test data, the data erasing test is carried out on the shadow area, and erasing performance parameters corresponding to the shadow area are collected in the test process.
S204, determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
Optionally, when determining the failure prediction result of the functional area in this embodiment, different weights may be respectively allocated to three erasure performance parameters, that is, the number of bad blocks, the erasure time T, and the erasure time increasing speed V; and then, calculating the weighted results of the three parameters and the corresponding weights, and comparing the weighted results with a performance level table corresponding to the chip to be tested, wherein the performance level table stores erasure performance parameters of different levels and corresponding fault levels so as to determine the fault prediction result of the functional area.
For example, when the number of bad blocks reaches a preset number, it is determined that the first failure level is satisfied, and when the erasing time T is greater than the time threshold Th, it is determined that the second failure level is satisfied. In addition, each model of chip may correspond to a different prediction rule, i.e., each chip corresponds to a different performance level table.
In the chip prediction method, the shadow area corresponding to the functional area is determined in the chip to be detected, and the erasing test is carried out on the shadow area, so that on one hand, the probability of failure of the functional area can be indirectly predicted, and the loss of the service life of the functional area in the test process is avoided; on the other hand, the test erasing times corresponding to the shadow area are made to be larger than the data erasing times (actual use condition of the functional area) of the functional area, so that the use condition of the shadow area is advanced than the use condition of the functional area, if the shadow area is damaged when the test erasing times are executed, the situation that the data erasing times of the functional area are possibly damaged when the test erasing times are reached can be judged, and the effect of carrying out fault prediction on any area (functional area) on the chip can be achieved.
As shown in fig. 4, this embodiment provides an alternative way to determine the shadow area corresponding to the functional area in the chip to be tested, that is, a way to refine S201. The specific implementation process can comprise the following steps:
S401, acquiring size information and position information of the functional area.
Each data block in the chip to be tested corresponds to a corresponding physical address, and in the chip to be tested, the data in the memory is written into a matrix by taking a bit (bit) as a unit, each unit is called a CELL (i.e. the data block in the embodiment), only one Row (Row) is required to be determined, and then one Column (Column) is required to be specified, so that a certain CELL can be accurately positioned, which is the basic principle of addressing the memory chip.
In this embodiment, for each functional area in the chip to be tested, the location information of the functional area may be determined according to the physical address of each data block corresponding to the functional area; further, the size information of the functional area may be determined according to the number of the corresponding data blocks of the functional area.
And S402, adjusting the size information of the functional area according to a preset scaling ratio to obtain the size information of the shadow area corresponding to the functional area.
It will be appreciated that since the shadow region acts as a performance predictor, the size of the shadow region is typically smaller than the size of its corresponding functional region; in this embodiment, the size information of the functional area is reduced based on a preset scaling ratio, so as to obtain the size information of the shadow area corresponding to the functional area; the preset scaling ratio can be determined according to the number of idle data blocks in the chip to be tested.
S403, determining the associated area of the functional area in the chip to be tested according to the position information of the functional area.
Wherein, the data erasing performance of the related area is similar to the data erasing performance of the functional area; the associated region of any functional region refers to a region adjacent to the functional region.
Optionally, statistical analysis can be performed on the influence range of the historical bad blocks to determine that the area of the area which can be influenced by one bad block is S'; further, after the number N of data blocks in the functional area is determined in this embodiment, it may be determined that the area affected by the broken N data blocks is S according to the characteristics of the historical statistics data Bad block region The method comprises the steps of carrying out a first treatment on the surface of the Correspondingly, the size of the associated area is S Bad block region -S Functional area The distribution position of the associated area relative to the functional area can also be determined according to the characteristics of the historical statistical data, for example, the associated area is distributed around or on one side of the functional area.
S404, determining the shadow area corresponding to the functional area in the relevant area according to the size information of the shadow area corresponding to the functional area.
Optionally, after determining the association area, an area with a size equal to the size information of the shadow area may be randomly found in the association area to be used as the shadow area.
However, since the shadow area is an area for performing the frequent erasing test, if the shadow area and the functional area are spaced too close to each other, if a bad block occurs in the shadow area, the bad block may affect the functional area. Therefore, in order to reduce the influence of frequent reading and writing of the shadow area on the functional area, the shadow area and the functional area are physically and spatially isolated in the present embodiment. Specifically, in one embodiment, a physical isolation area is added between the shadow area and the functional area in this embodiment, as shown in fig. 5, the method specifically may include the following steps:
s501, determining the size information of the physical isolation corresponding to the chip to be tested according to the type of the chip to be tested.
The size information of the physical isolation refers to the interval distance when two areas (or two data blocks) which are not adjacent to each other are effectively isolated in the chip to be tested, and the calculation unit of the interval distance can be the number of the data blocks.
Optionally, the data blocks are called banks of the memory chip in a mode of array according to rows and columns in the chip; for example, in this embodiment, each row of chips to be tested is referred to as a Bank, and the chips are usually manufactured by using n banks as units in the manufacturing process, so the calculation unit of the size information of the physical isolation corresponding to the chip to be tested in this embodiment may be the number of banks, and the size information of the physical isolation corresponding to each chip may also be different due to different manufacturing processes of each chip.
The size information of the physical isolation corresponding to the first chip to be tested is 3 banks, the size information of the physical isolation corresponding to the second chip to be tested is 4 banks, the size information of the physical isolation corresponding to the third chip to be tested is 5 banks, etc. Optionally, the size information of the physical isolation of the chip to be tested in this embodiment may be provided by the producer of the chip to be tested, or may be an empirical value determined in the testing process.
S502, determining a shadow area and a physical isolation area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area and the size information of the physical isolation corresponding to the chip to be tested.
Wherein, as shown in fig. 6, the physical isolation area is located between the functional area and the shadow area.
In this embodiment, by setting a physical isolation area between the functional area and the shadow area corresponding to the functional area, the functional area and the shadow area are isolated in the physical space, and then when the shadow area is frequently erased, the data block performance of the functional area is not easily affected when the shadow area is damaged, so that the influence of the test process is reduced.
The above process illustrates the process of determining the size information, shadow region location and size of the physical isolation using a single functional region as an example; further, if the number of functional areas is at least two, in one embodiment, an alternative way of determining the associated area of the functional area in the chip to be tested according to the location information of the functional area is provided, as shown in fig. 7, including:
s701, grouping the functional areas to obtain at least one functional group.
Each function group comprises two function areas which are in a main-standby relation with each other. In this embodiment, each function group includes a function area a and a function area B that are backup to each other, for example, the function area a may be a backup area of the function area B, and the function area B may be a backup area of the function area a.
S702, determining the associated area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group.
In one implementation manner, the relevant area of each functional area in each functional group is determined in the chip to be tested directly according to the position information of each functional area in each functional group and the manner in S303.
In another implementation manner, in order to further reduce the difference between the data block (or Bank) where the shadow area is located and the data block (or Bank) where the functional area is located, the prediction accuracy is improved, and in this embodiment, the shadow area may be disposed between the functional areas, that is, the shadow area may be disposed in the area between the functional area a and the functional area B in the same functional group.
Specifically, the method can comprise the following steps: for each functional group, determining the associated area of each functional area in the functional group in the chip to be tested according to the position information of each functional area in the functional group; determining the coincidence area between the associated areas of the functional areas in the functional group; and updating the associated area of each functional area in the functional group by adopting the overlapped area.
It is understood that, with the overlapping area as the associated area, both the shadow area of the functional area a and the shadow area of the functional area B within the same functional group may be disposed within the overlapping area, i.e., between the functional area a and the functional area B.
In this embodiment, the overlapping area corresponding to the functional group is determined as the associated area of any functional area in the functional group, in this case, two shadow areas corresponding to one functional group fall into the overlapping area, so that the data erasing performance of the shadow areas is similar to the data erasing performance of the functional areas, and the accuracy of fault prediction is improved.
Further, in one embodiment, taking any one of the above groups of functions as an example, the process of S201 to S204 will be described:
(1) Acquiring a functional area A and a functional area B in a chip to be tested;
(2) The function determines the size information of shadow areas A ', B' according to the size of the area A, B and the preset scaling;
(3) Counting the data erasing times of the functional area A, B in a target time period, and setting the data erasing times as FWA and FWB respectively;
(4) Determining test erasing times FWA 'and FWB' corresponding to shadow areas A 'and B' according to FWA and FWB and preset amplification ratios;
(5) Detecting erasing time of the chip to be tested in the executing process of the data erasing times FWA and FWB and the test erasing times FWA 'and FWB' and respectively being TWA, TWB, TWA 'and TWB';
(6) As the erasing times of the chip to be tested are increased, the erasing time is prolonged along with the reduction of the service life, namely TWA, TWB, TWA 'and TWB' are gradually increased along with the time;
(7) Setting TWA, TWB, TWA 'and TWB' increasing speeds to be VWA, VWB, VWA 'and VWB' respectively;
(8) Setting TWA, TWB, TWA 'and TWB' early warning values as THA, THB, THA 'and THB' respectively;
(9) Setting THA < = THA ', THB < = THB';
(10) Normally, VWA '> VWA, VWB' > VWB, and therefore, the TWA ', TWB' will reach the pre-warning values THA ', THB' before TWA, TWB; and when TWA 'and TWB' corresponding to the shadow area reach the early warning value, the potential expiration risk exists in the erasing life of the functional area, and the operations such as warning, log recording, storage area replacement and the like are performed.
In one embodiment, an alternative way of determining the failure prediction result of the functional area based on the erasure performance parameter corresponding to the shadow area is provided, that is, refinement of S204, as shown in fig. 8, may specifically include:
s801, a target fault prediction model corresponding to the chip to be tested is obtained.
The target fault prediction model is a prediction model corresponding to the chip to be detected; the predictive model may be a neural network model.
S802, the erasure performance parameters corresponding to the shadow area are input into the target fault prediction model, and a fault prediction result of the functional area is obtained.
Specifically, the erasure performance parameters corresponding to the shadow area are input into a target fault prediction model after training is completed, and a fault prediction result of the functional area is obtained; the fault prediction result may be a fault probability value.
Optionally, when training the prediction model corresponding to the chip to be tested, the method includes: acquiring erasure performance parameters and fault prediction results corresponding to the sample chip; training an initial fault prediction model corresponding to the chip to be tested based on the erasure performance parameters and the fault prediction results corresponding to the sample chip to obtain a target fault prediction model corresponding to the chip to be tested.
The sample chip and the chip to be tested are chips of the same model, and the initial fault prediction model is an untrained target fault prediction model.
In this embodiment, by generating different fault prediction models, differential analysis on different types of chips can be realized, and analysis accuracy is improved.
In one embodiment, as shown in fig. 9, a specific application scenario of a chip fault prediction method is provided, that is, in the unmanned field, an unmanned central computing platform scheme architecture may include a plurality of functional modules, which are respectively: a central processing unit (central processing unit, CPU) central computing layer, a data exchange layer, an embedded neural network processor (NPU AI) computing layer, and a micro control unit (MicrocontrollerUnit, MCU) vehicle control layer; in the above functional modules, the types of the FLASH chips adopted may be different according to different computing forces, such as eMMC, NANDFLASH, SPI FLASH, etc. Wherein, CPU central computing layer is regarded as the execution main body of the embodiment of the application.
The CPU central computing layer is composed of powerful CPUs, is internally provided with high-performance CPUs, GPUs and the like and is used for running management of equipment and carrying out central decision operation on processing results of all functional modules;
the data exchange layer is composed of an Ethernet exchange chip and a PCIe (peripheral component interconnect) exchange chip and is used for forwarding Ethernet messages, realizing master-slave data intercommunication and the like through RC and EP modes of a PCIe exchange chip, realizing cross-domain data intercommunication and high-speed data transmission of each functional module, and processing the preprocessing, data operation and the like of part of laser radar;
the NPU AI calculation layer is composed of NPUs (neutral point network units) through the high-power neural network unit and is used for processing AI operation of vision and laser radar, and an operation result is forwarded to the CPU central calculation layer through the data exchange layer for fusion decision;
the MCU vehicle control layer is composed of an MCU with higher functional safety level and is used for providing state monitoring, abnormal fault processing and the like for the whole processor, carrying part of regulation algorithm and simultaneously being responsible for issuing vehicle control instructions and the like with the vehicle execution unit.
In one implementation, the Flash failure prediction tasks in the respective functional modules may be performed by internal management of the respective functional modules, i.e., by a failure prediction model local to the respective functional modules.
In another implementation manner, the central computing layer of the CPU performs centralized monitoring and management, and performs centralized issuing and management control on the chip health management policy, where the chip health management policy includes a fault prediction model corresponding to each functional module.
Specifically, the Central Processing Unit (CPU) central computing layer set is also used for training the fault prediction model corresponding to each chip, and the CPU central computing layer set carries out differential training according to the difference of FLASH manufacturer and model selected by the corresponding functional module; the training results can be uniformly transmitted to the CPU computing layer through the data interaction computing layer for uniform analysis and processing. Furthermore, the CPU calculation layer can train the fault prediction model more accurately through linkage with the cloud server, accuracy and timeliness of FLASH health management of mass equipment are improved, fault detection strategies are adjusted in time according to big data analysis, and common risks caused by different manufacturers and batches of FLASH are monitored in time.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a chip fault prediction device for realizing the above related chip fault prediction method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the chip failure prediction device or devices provided below may be referred to the limitation of the chip failure prediction method hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 10, there is provided a chip failure prediction apparatus 1 including: an acquisition module 11, an analysis module 12, a test module 13 and a prediction module 14, wherein:
the acquiring module 11 is configured to acquire a functional area for storing data in a chip to be tested, and determine a shadow area corresponding to the functional area in the chip to be tested;
the analysis module 12 is configured to determine a test erasing number corresponding to the shadow area according to the data erasing number of the functional area in the target time period; the test erasing times are larger than the data erasing times;
the test module 13 is configured to perform a data erasing test on the shadow area based on the test erasing times, so as to obtain an erasing performance parameter corresponding to the shadow area;
And the prediction module 14 is configured to determine a fault prediction result of the functional area based on the erasure performance parameter corresponding to the shadow area.
In one embodiment, the acquisition module 11 comprises:
the acquisition sub-module is used for acquiring the size information and the position information of the functional area;
the adjusting submodule is used for adjusting the size information of the functional area according to the preset scaling to obtain the size information of the shadow area corresponding to the functional area;
the correlation sub-module is used for determining the correlation area of the functional area in the chip to be tested according to the position information of the functional area; wherein, the data erasing performance of the related area is similar to the data erasing performance of the functional area;
and the determining submodule is used for determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area.
In one embodiment, if the number of functional areas is at least two, determining the sub-module includes:
the dividing slave module is used for grouping the functional areas to obtain at least one functional group; each function group comprises two function areas which are in a main-standby relation with each other;
and the association slave module is used for determining the association area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group.
In one embodiment, the association slave module is further configured to determine, for each functional group, an association area of each functional area in the functional group in the chip to be tested according to the location information of each functional area in the functional group;
determining the coincidence area between the associated areas of the functional areas in the functional group;
and updating the associated area of each functional area in the functional group by adopting the overlapped area.
In one embodiment, the determining sub-module is further to: determining the size information of the physical isolation corresponding to the chip to be tested according to the type of the chip to be tested;
determining a shadow area and a physical isolation area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area and the size information of the physical isolation corresponding to the chip to be tested; wherein the physical isolation region is located between the functional region and the shadow region.
In one embodiment, the prediction module 14 is further configured to: obtaining a target fault prediction model corresponding to a chip to be tested;
and inputting the erasure performance parameters corresponding to the shadow area into the target fault prediction model to obtain a fault prediction result of the functional area.
The above-described respective modules in the chip failure prediction apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data of the chip fault prediction method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of chip failure prediction.
It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested;
determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period;
the test erasing times are larger than the data erasing times;
based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area;
and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
In one embodiment, when the processor executes the logic of the computer program to determine the shadow area corresponding to the functional area in the chip to be tested, the following steps are specifically implemented: acquiring size information and position information of a functional area; according to the preset scaling, the size information of the functional area is adjusted to obtain the size information of the shadow area corresponding to the functional area; determining an associated area of the functional area in the chip to be tested according to the position information of the functional area; wherein, the data erasing performance of the related area is similar to the data erasing performance of the functional area; and determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area.
In one embodiment, if the number of the functional areas is at least two, the processor executes the computer program to determine the logic of the associated area of the functional area in the chip to be tested according to the position information of the functional areas, and specifically implements the following steps: grouping the functional areas to obtain at least one functional group; each function group comprises two function areas which are in a main-standby relation with each other; and determining the associated area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group.
In one embodiment, when the processor executes the computer program to determine the logic of the associated area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group, the following steps are specifically implemented: for each functional group, determining the associated area of each functional area in the functional group in the chip to be tested according to the position information of each functional area in the functional group; determining the coincidence area between the associated areas of the functional areas in the functional group; and updating the associated area of each functional area in the functional group by adopting the overlapped area.
In one embodiment, when the processor executes the computer program to determine the logic of the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area, the following steps are specifically implemented: determining the size information of the physical isolation corresponding to the chip to be tested according to the type of the chip to be tested; determining a shadow area and a physical isolation area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area and the size information of the physical isolation corresponding to the chip to be tested; wherein the physical isolation region is located between the functional region and the shadow region.
In one embodiment, when the processor executes the computer program to perform the data erasing test on the shadow area based on the test erasing times to obtain the logic of the erasing performance parameter corresponding to the shadow area, the following steps are specifically implemented: and based on the test erasing times and the acquired test data, performing data erasing test on the shadow area to obtain the effective erasing times, the erasing time and the change speed of the erasing time corresponding to the shadow area.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring erasure performance parameters and fault prediction results corresponding to the sample chip; the sample chip and the chip to be tested are chips of the same model; training an initial fault prediction model corresponding to the chip to be tested based on the erasure performance parameters and the fault prediction results corresponding to the sample chip to obtain a target fault prediction model corresponding to the chip to be tested.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested;
Determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period;
the test erasing times are larger than the data erasing times;
based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area;
and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
In one embodiment, the logic of determining the shadow area corresponding to the functional area in the chip to be tested is executed by the processor, and specifically implements the following steps: acquiring size information and position information of a functional area; according to the preset scaling, the size information of the functional area is adjusted to obtain the size information of the shadow area corresponding to the functional area; determining an associated area of the functional area in the chip to be tested according to the position information of the functional area; wherein, the data erasing performance of the related area is similar to the data erasing performance of the functional area; and determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area.
In one embodiment, if the number of the functional areas is at least two, the computer program specifically implements the following steps when the logic for determining the associated area of the functional area in the chip to be tested is executed by the processor according to the location information of the functional areas: grouping the functional areas to obtain at least one functional group; each function group comprises two function areas which are in a main-standby relation with each other; and determining the associated area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group.
In one embodiment, the computer program specifically implements the following steps when logic for determining the associated area of each functional area in each functional group in the chip to be tested is executed by the processor according to the position information of each functional area in each functional group: for each functional group, determining the associated area of each functional area in the functional group in the chip to be tested according to the position information of each functional area in the functional group; determining the coincidence area between the associated areas of the functional areas in the functional group; and updating the associated area of each functional area in the functional group by adopting the overlapped area.
In one embodiment, the computer program specifically implements the following steps when the logic for determining the shadow area corresponding to the functional area in the associated area is executed by the processor according to the size information of the shadow area corresponding to the functional area: determining the size information of the physical isolation corresponding to the chip to be tested according to the type of the chip to be tested; determining a shadow area and a physical isolation area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area and the size information of the physical isolation corresponding to the chip to be tested; wherein the physical isolation region is located between the functional region and the shadow region.
In one embodiment, the logic for determining the failure prediction result for the functional area based on the erasure performance parameters corresponding to the shadow area is executed by the processor, and specifically implements the steps of: obtaining a target fault prediction model corresponding to a chip to be tested; and inputting the erasure performance parameters corresponding to the shadow area into the target fault prediction model to obtain a fault prediction result of the functional area.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring erasure performance parameters and fault prediction results corresponding to the sample chip; the sample chip and the chip to be tested are chips of the same model; training an initial fault prediction model corresponding to the chip to be tested based on the erasure performance parameters and the fault prediction results corresponding to the sample chip to obtain a target fault prediction model corresponding to the chip to be tested.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase ChangeMemory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as Static Random access memory (Static Random access memory AccessMemory, SRAM) or dynamic Random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A method for predicting a chip failure, the method comprising:
acquiring a functional area for storing data in a chip to be tested, and determining a shadow area corresponding to the functional area in the chip to be tested;
determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; wherein the test erasing times are greater than the data erasing times;
Based on the test erasing times, carrying out data erasing test on the shadow area to obtain erasing performance parameters corresponding to the shadow area;
and determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
2. The method of claim 1, wherein determining, in the chip to be tested, a shadow area corresponding to the functional area comprises:
acquiring size information and position information of the functional area;
adjusting the size information of the functional area according to a preset scaling ratio to obtain the size information of a shadow area corresponding to the functional area;
determining an associated area of the functional area in the chip to be tested according to the position information of the functional area; wherein the data erasing performance of the related area is similar to the data erasing performance of the functional area;
and determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area.
3. The method according to claim 2, wherein if the number of the functional areas is at least two, the determining, in the chip to be tested, the associated area of the functional area according to the location information of the functional areas includes:
Grouping the functional areas to obtain at least one functional group; each function group comprises two function areas which are in a main-standby relation with each other;
and determining the associated area of each functional area in each functional group in the chip to be tested according to the position information of each functional area in each functional group.
4. The method of claim 3, wherein determining the associated area of each functional area in each functional group in the chip to be tested according to the location information of each functional area in each functional group comprises:
for each functional group, determining the associated area of each functional area in the functional group in the chip to be tested according to the position information of each functional area in the functional group;
determining the coincidence area between the associated areas of the functional areas in the functional group;
and updating the associated area of each function area in the function group by adopting the overlapped area.
5. The method according to claim 2, wherein determining the shadow area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area comprises:
determining the size information of the physical isolation corresponding to the chip to be tested according to the type of the chip to be tested;
Determining a shadow area and a physical isolation area corresponding to the functional area in the associated area according to the size information of the shadow area corresponding to the functional area and the size information of the physical isolation corresponding to the chip to be tested; wherein the physical isolation region is located between the functional region and the shadow region.
6. The method of claim 1, wherein determining the failure prediction result of the functional area based on the erasure performance parameter corresponding to the shadow area comprises:
obtaining a target fault prediction model corresponding to the chip to be tested;
and inputting the erasure performance parameters corresponding to the shadow area into the target fault prediction model to obtain a fault prediction result of the functional area.
7. The method of claim 6, wherein the method further comprises:
acquiring erasure performance parameters and fault prediction results corresponding to the sample chip; the sample chip and the chip to be tested are chips of the same model;
and training an initial fault prediction model corresponding to the chip to be tested based on the erasure performance parameter and the fault prediction result corresponding to the sample chip to obtain the target fault prediction model corresponding to the chip to be tested.
8. The method of claim 1, wherein the chip under test is a flash memory chip.
9. A chip failure prediction apparatus, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a functional area for storing data in a chip to be detected and determining a shadow area corresponding to the functional area in the chip to be detected;
the analysis module is used for determining the test erasing times corresponding to the shadow area according to the data erasing times of the functional area in the target time period; wherein the test erasing times are greater than the data erasing times;
the test module is used for carrying out data erasing test on the shadow area based on the test erasing times to obtain erasing performance parameters corresponding to the shadow area;
and the prediction module is used for determining a fault prediction result of the functional area based on the erasure performance parameters corresponding to the shadow area.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
CN202310789808.1A 2023-06-29 2023-06-29 Chip fault prediction method, device, computer equipment and storage medium Pending CN116775436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310789808.1A CN116775436A (en) 2023-06-29 2023-06-29 Chip fault prediction method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310789808.1A CN116775436A (en) 2023-06-29 2023-06-29 Chip fault prediction method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116775436A true CN116775436A (en) 2023-09-19

Family

ID=87985774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310789808.1A Pending CN116775436A (en) 2023-06-29 2023-06-29 Chip fault prediction method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116775436A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828450A (en) * 2024-03-06 2024-04-05 深圳市铨天科技有限公司 Big data-based package test method, system and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828450A (en) * 2024-03-06 2024-04-05 深圳市铨天科技有限公司 Big data-based package test method, system and medium
CN117828450B (en) * 2024-03-06 2024-05-17 深圳市铨天科技有限公司 Big data-based package test method, system and medium

Similar Documents

Publication Publication Date Title
CN101529526B (en) Method for estimating and reporting the life expectancy of flash-disk memory
US9263136B1 (en) Data retention flags in solid-state drives
CN109817267B (en) Deep learning-based flash memory life prediction method and system and computer-readable access medium
US7966531B2 (en) Memory diagnosis apparatus
KR100406333B1 (en) Apparatus and method for detecting and assessing a spatially discrete dot pattern
CN112908399B (en) Flash memory abnormality detection method and device, computer equipment and storage medium
CN113257332B (en) Effectiveness prediction method and device for flash memory and storage medium
US20220058488A1 (en) Partitionable Neural Network for Solid State Drives
JP2007220284A (en) Memory device fail summary data reduction for improved redundancy analysis
Du et al. Predicting uncorrectable memory errors for proactive replacement: An empirical study on large-scale field data
CN116775436A (en) Chip fault prediction method, device, computer equipment and storage medium
CN114968652A (en) Fault processing method and computing device
CN111143146A (en) Health state prediction method and system of storage device
CN115640174A (en) Memory fault prediction method and system, central processing unit and computing equipment
CN112133360B (en) Memory management apparatus, system, and method
Liu et al. Online fault detection in ReRAM-based computing systems for inferencing
Liu et al. Online fault detection in ReRAM-based computing systems by monitoring dynamic power consumption
KR20240065183A (en) Methods for predicting memory errors, electronic devices and computer-readable storage media
CN114764596A (en) Method and device for prolonging hard disk service life, computer equipment and storage medium
CN115543702A (en) Multi-source solid state disk collaborative fault diagnosis method, system, equipment and medium
WO2022027170A1 (en) Flash memory data management method, storage device controller, and storage device
KR20210031220A (en) Storage Device and Operating Method of the same
US7139944B2 (en) Method and system for determining minimum post production test time required on an integrated circuit device to achieve optimum reliability
CN111863109A (en) Three-dimensional flash memory interlayer error rate model and evaluation method
CN112906727A (en) Method and system for real-time online detection of virtual machine state

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination