CN115171768B - Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency - Google Patents

Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency Download PDF

Info

Publication number
CN115171768B
CN115171768B CN202211080666.3A CN202211080666A CN115171768B CN 115171768 B CN115171768 B CN 115171768B CN 202211080666 A CN202211080666 A CN 202211080666A CN 115171768 B CN115171768 B CN 115171768B
Authority
CN
China
Prior art keywords
abnormal
event
physical
failure
ssd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211080666.3A
Other languages
Chinese (zh)
Other versions
CN115171768A (en
Inventor
徐高翔
薛红军
孙丽华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dera Technology Co Ltd
Original Assignee
Beijing Dera Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dera Technology Co Ltd filed Critical Beijing Dera Technology Co Ltd
Priority to CN202211080666.3A priority Critical patent/CN115171768B/en
Publication of CN115171768A publication Critical patent/CN115171768A/en
Application granted granted Critical
Publication of CN115171768B publication Critical patent/CN115171768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/38Response verification devices
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C2029/4402Internal storage of test result, quality data, chip identification, repair information

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of data storage, and provides a method, a device, a storage medium and equipment for improving SSD defective product analysis efficiency, wherein the method comprises the following steps: obtaining abnormal events of the tested bad SSD in the test from the log file; analyzing each abnormal event to obtain a physical device corresponding to the abnormal event; dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events; calculating the abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and the scoring weight of the event type; and judging the physical device with the abnormal score larger than the corresponding preset score threshold value as an abnormal component and inquiring the position of the abnormal component on the PCB. The invention can automatically position the abnormal component of the SSD defective product, so that maintenance personnel can maintain the corresponding component according to the analysis result, the maintenance flow of the defective product is simplified, the cost of the maintenance flow personnel is reduced, and the production efficiency is improved.

Description

Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency
Technical Field
The invention relates to the technical field of data storage, in particular to a method, a device, a storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency.
Background
SSD (Solid State Disk or Solid State Drive) is produced in a factory and needs to be tested and verified in multiple stages. Each stage tests different modules and functions in different environments, and rejects unqualified products through the tests. For defective products, testing or production personnel cannot know the reason causing the defects and the positions of the defective devices, and a maintenance scheme cannot be directly given. Developers are often required to export log files from SSDs that record exception information and then analyze the log to find the root cause of the failure and provide a repair solution.
At present, most of the prior art schemes acquire SSD information from the perspective of the SSD user during failure analysis, and because the amount of information is limited, it is difficult to fundamentally determine the detailed cause of the failure, and the method for determining whether the component level fails is not accurate enough.
In addition, the existing technical scheme is mainly based on the manual treatment of research and development personnel in the treatment mode of defective products at the production end. Because the quantity of the SSD produced by the factory is huge, the quantity of the produced defective products is large, and if each SSD needs to be analyzed in detail by research personnel and a solution suggestion is proposed, the workload is huge for the research and development. The maintenance personnel can repair the SSD which is not good according to the recommendation, and in the process, the test personnel, the firmware developer, the hardware developer and the maintenance personnel are matched with each other. In addition, in the production process, SSD defective products of different models may exist, the main device of the SSD is NAND Flash, characteristics of the NAND Flash are greatly different for different manufacturers, models and types, and when fault analysis is performed on the NAND Flash, targeted analysis and judgment are needed, so that the analysis on the defective products can accurately analyze results only by corresponding parameter information. Therefore, the fault analysis in the SSD production test has more problems and is complicated, and the calculation, statistics and analysis of research personnel are relied on, so that a large amount of manpower and material resource time is consumed, the efficiency is low, and errors are easy to occur.
Disclosure of Invention
In view of the above, the present invention is proposed to provide a method, apparatus, storage medium and device for improving SSD failure analysis efficiency that overcome or at least partially solve the above problems.
In one aspect of the present invention, a method for improving SSD failure analysis efficiency is provided, the method comprising:
reading the log file to obtain abnormal events of the tested bad SSD in the test from the log file;
analyzing each abnormal event to obtain a physical device corresponding to each abnormal event;
dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events;
calculating the abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and the preset scoring weight of each event type;
and judging whether the abnormal score of each physical device is larger than the corresponding preset score threshold value, and judging the physical device of which the abnormal score is larger than the corresponding preset score threshold value as an abnormal component.
Further, in reading the log file, the method further comprises:
acquiring the firmware configuration information of the tested bad SSD from the log file;
correspondingly, the analyzing each abnormal event to obtain the physical device corresponding to each abnormal event includes: and analyzing the position information of the abnormal events in the log file based on the firmware configuration information of the bad SSD to obtain the physical devices corresponding to the abnormal events.
Further, in reading the log file, the method further comprises:
obtaining the PCB information of the tested bad SSD from the log file;
after the physical devices with abnormal scores larger than the corresponding preset score threshold are judged as abnormal components, acquiring device position mapping relations corresponding to the PCB information according to the PCB information, wherein the device position mapping relations comprise position distribution information of each physical device in the bad SSD in the PCB;
and determining the position distribution of the abnormal component in the PCB according to the component position mapping relation.
Further, the exception event includes: the method comprises the following steps of main controller exception, DDR exception, peripheral device exception and NAND flash memory exception, wherein the NAND flash memory exception comprises the following steps: NAND FLASH initialization failure, erasure failure, programming failure and rereading failure;
the dividing of the abnormal events into different event types according to the difference of the physical devices corresponding to the abnormal events comprises:
dividing the abnormality of the main controller, the abnormality of the DDR and the abnormality of the peripheral device into corresponding event types respectively;
the NAND FLASH initialization failure, the erasing failure, the programming failure and the re-reading failure are divided into different event types according to different chip selection areas, particles, physical blocks or physical pages of the event.
Further, the method further comprises:
performing abnormal event statistics on the NAND FLASH initialization failure according to the chip target of the event;
carrying out abnormal event statistics on the erasure failures according to the physical blocks in which the events occur;
carrying out abnormal event statistics on programming failure according to a physical block in which an event occurs;
and performing exception statistics on rereading failures according to the physical pages where the events occur.
Further, the method further comprises:
if the physical pages with re-reading failures are regularly distributed on the physical blocks, respectively carrying out abnormal event statistics on the re-reading failures according to the physical pages and the physical blocks with the events;
and if the physical page with the rereading failure is regularly distributed on the physical block and the physical block with the rereading failure is regularly distributed on the particle, respectively carrying out abnormal event statistics on the rereading failure according to the physical page, the physical block and the particle of the event.
Further, the scoring weights for the various event types have the following relationship:
the main controller exception = DDR exception = peripheral device exception = NAND FLASH initialization failure > erase failure = program failure = reread failure occurring at grain > reread failure occurring at physical block > reread failure occurring at physical page.
In another aspect of the present invention, an apparatus for improving SSD failure analysis efficiency is provided, the apparatus comprising:
the acquisition module is used for reading the log file so as to acquire the abnormal events of the tested bad SSD in the test from the log file;
the analysis module is used for analyzing each abnormal event to obtain a physical device corresponding to each abnormal event;
the statistical module is used for dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events;
the calculation module is used for calculating the abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and the preset scoring weight of each event type;
and the judging module is used for judging whether the abnormal score of each physical device is greater than the corresponding preset score threshold value, and judging the physical device with the abnormal score greater than the corresponding preset score threshold value as the abnormal component.
In a third aspect of the present invention, a computer readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the method for improving SSD failure analysis efficiency as described above.
In a fourth aspect of the present invention, there is also provided a computer device, including a memory, a processor and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method for improving SSD failure analysis efficiency.
According to the method, the device, the storage medium and the equipment for improving the analysis efficiency of the SSD defective products, provided by the invention, aiming at the tested SSD defective products, the root cause of an abnormal event of the SSD defective products can be judged through automatically analyzing the log of the SSD defective products, the abnormal components of the SSD defective products are automatically positioned, the intervention of firmware research personnel and hardware research personnel is not needed, the maintenance personnel can maintain the corresponding components according to the analysis result, the maintenance flow of the defective products is simplified, the cost of the maintenance flow personnel is reduced, and the production efficiency is effectively improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart of a method for improving SSD failure analysis efficiency according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for improving SSD failure analysis efficiency according to another embodiment of the invention;
fig. 3 is a block diagram of a structure of an apparatus for improving SSD failure analysis efficiency according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Fig. 1 schematically shows a flowchart of a method for improving SSD failure analysis efficiency according to an embodiment of the invention. Referring to fig. 1, the method for improving SSD defective product analysis efficiency according to the embodiment of the present invention specifically includes the following steps:
s11, reading the log file to obtain abnormal events of the tested bad SSD in the test from the log file.
A Solid State Disk (SSD) is a hard Disk made of an array of Solid State electronic memory chips, and the test in this embodiment is a test on basic functions, performance, reliability, and the like of a product during SSD batch production test.
The Log file is a file which records event messages generated by the SSD firmware in the running process and records the event messages according to the time sequence. The primary event messages include events, warnings, exceptions, errors, and the like. The Log file read in this embodiment is a Log file during SSD mass production testing.
Specifically, in the SSD production test, verification and stress test of basic functions of peripheral devices, a main controller, DDR, NAND Flash, and the like of a product are required, and a defective SSD, which is a defective product of the SSD, is screened out. Among them, the DDR memory is called DDR SDRAM (Double Data Rate SDRAM ).
The information obtained in log for various exception events is listed below:
a) The main controller is abnormal: and recording the abnormal reason.
b) DDR exception: the cause of the anomaly, as well as the error condition, is recorded.
c) Peripheral devices: the cause of the abnormality is recorded, as well as the error condition. Such as capacitance anomalies.
d) NAND Flash initialization fails: record channel, particle, etc.
e) Erasure (erase) failure: record TLC or SLC, particles, physical blocks, etc.
f) Program (program) failure: record TLC or SLC, logical particles, physical blocks, physical pages, etc.
g) Reread (read) failure: record TLC or SLC, pellet, physical block, physical page, unit, etc.
The NAND Flash memory is one of Flash memories, and a nonlinear macro-unit mode is adopted in the NAND Flash memory, so that a cheap and effective solution is provided for realizing a solid-state large-capacity memory. The NAND Flash memory has the advantages of large capacity, high rewriting speed and the like, and is suitable for storing a large amount of data. Wherein, target: the chip select area, typically containing 1 or 2 or 4 particles, physically shares a chip select signal, the first command for Nand initialization, is initialized in units of chip select area target. And (3) particles: the number of the logic unit and the minimum unit for independently executing the command and reporting the state in the NAND Flash. Physical block: the smallest unit of operation is erased (erase). Physical page: the minimum unit of a program operation. Unit:4k bytes. TLC: each unit of the (Trinary-Level Cell) can store 3-bit data (3 bits/Cell), and the performance and the service life are poor. SLC: single-Level Cell) can store 1bit data (1 bit/Cell) per Cell, and has good performance and long service life.
And S12, analyzing each abnormal event to obtain a physical device corresponding to each abnormal event.
When the log reading is completed, the exception event captured in step S11 needs to be resolved. The reason is that the captured abnormal events are output from the perspective of the firmware logic and need to be converted into physical information, so that the specific position of the device in the PCB can be conveniently found through the schematic diagram of the PCB. For example, a rereading (readretry) fails, and physical information such as a particle, a physical block, a physical page, and a unit is obtained from the log.
In this embodiment, in the process of reading the log file, the firmware configuration information of the tested bad SSD is also obtained from the log file.
In step S12, analyzing each abnormal event to obtain a physical device corresponding to each abnormal event, which specifically includes the following steps: and analyzing the position information of the abnormal events in the log file based on the firmware configuration information of the bad SSD to obtain the physical devices corresponding to the abnormal events. According to the method and the device, the firmware configuration information of the bad SSD is obtained in the production test process of the SSD, the SSD with different models can be compatible, and automatic analysis of different NAND flashes is realized.
Specifically, the embodiment reads the log file line by line to obtain the firmware configuration information of the SSD. The SSD with different models uses different hardware such as PCBs, NAND FLASH, peripheral devices and the like, and has different firmware. The printing in the log is an event output based on the view angle of the firmware, and the event in the log can be accurately analyzed only by acquiring the configuration information of the firmware and analyzing the position information of the abnormal event in the log file based on the firmware configuration information of the bad SSD. The firmware configuration information comprises physical information of the nand flash, such as the type of nand, the number of granules in the granules, physical blocks, physical page sizes and the like; the PCB models are different due to the fact that the produced SSD models are many, the PCB models corresponding to each model are different, and the number and the positions of components on PCBs of different models are also different; other configuration parameters, such as how many nand particles are used, how many capacitors are used, etc.
And S13, dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events.
And S14, calculating the abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and the preset scoring weight of each event type.
And S15, judging whether the abnormal score of each physical device is larger than the corresponding preset score threshold value, and judging the physical device of which the abnormal score is larger than the corresponding preset score threshold value as the abnormal component.
In this embodiment, for the obtained abnormal event, the distribution of the physical devices of the abnormal event is analyzed and recorded. And for the physical devices with abnormal event records, according to different abnormal types and matching with corresponding weights, counting abnormal records corresponding to all the physical devices, performing cumulative scoring, judging that the devices are abnormal when the abnormal records exceed a threshold value, needing maintenance, and listing position information of the devices needing maintenance.
According to the method for improving the SSD defective product analysis efficiency, the failure reason can be analyzed from the angle of an SSD device developer by automatically analyzing the log of the SSD defective product according to the tested SSD defective product, the root cause of the abnormal event of the SSD defective product can be directly and accurately positioned, the abnormal component of the SSD defective product can be automatically positioned, the defective product analysis efficiency is improved, the intervention of firmware research and development personnel and hardware research and development personnel is not needed, the maintenance personnel can maintain the corresponding component according to the analysis result, the maintenance flow of the defective product is simplified, the personnel cost of the maintenance flow is reduced, and the production efficiency is effectively improved.
In the embodiment of the invention, defective products in the production process of the SSD are treated. According to the log event record, the abnormal event comprises: host controller exception, DDR exception, peripheral device exception and NAND flash memory exception. Wherein the NAND flash memory exceptions include: NAND FLASH initialization failure, erase failure, program failure, reread failure, etc. Accordingly, the abnormal device mainly includes: main control unit, NAND FLASH, DDR, peripheral device. NAND FLASH exceptions including initialization failures, erase failures, program failures, and read retries failures.
For NAND FLASH abnormity, the invention adopts a method for counting the distribution situation in a mode of particle, physical block and physical page level, and combines a certain weight to judge the root cause of the occurrence of NAND FLASH abnormity events. Specifically, the dividing of the abnormal events into different event types according to the difference of the physical devices corresponding to the abnormal events in step S13 specifically includes:
dividing the abnormality of the main controller, the abnormality of the DDR and the abnormality of the peripheral device into corresponding event types respectively;
the NAND FLASH initialization failure, the erasing failure, the programming failure and the re-reading failure are divided into different event types according to the chip target, the grain, the physical block or the physical page of the event. Further, abnormal event statistics can be carried out on the NAND FLASH initialization failure according to the chip target where the event occurs; carrying out abnormal event statistics on the erasure failures according to the physical blocks in which the events occur; carrying out abnormal event statistics on programming failure according to a physical block in which an event occurs; performing abnormal event statistics on the rereading failure according to the physical page where the event occurs, specifically, if the physical page where the rereading failure occurs is regularly distributed on the physical block, performing the abnormal event statistics on the rereading failure according to the physical page and the physical block where the event occurs respectively; if the physical pages with rereading failures are regularly distributed on the physical blocks and the physical blocks with rereading failures are regularly distributed on the particles, abnormal event statistics is respectively carried out on the rereading failures according to the physical pages, the physical blocks and the particles of the events.
In an alternative embodiment, the scoring weights for the various event types have the following relationship:
the main controller exception = DDR exception = peripheral device exception = NAND FLASH initialization failure > erase failure = program failure = reread failure occurred at grain > reread failure occurred at physical block > reread failure occurred at physical page.
In a specific example, after the abnormal events are extracted from the log file, the physical devices corresponding to the abnormal events are analyzed, so that after the physical devices corresponding to the abnormal events are obtained, the abnormal events are divided into different event types according to the difference of the physical devices corresponding to the abnormal events. Specifically, the distribution and the number of captured abnormal events are respectively counted. Often the records of the same type of anomaly occur many times, the records are collected, sorted by a specific rule, and the total number of anomalies is counted. The various types of anomalies affect the SSD to different degrees, requiring different weights to be configured. For the exception of the NAND FLASH, special processing is required. And for the abnormity of the NAND FLASH, statistics is respectively carried out according to the particles, the physical blocks and the physical pages. This is determined by the characteristics of NAND FLASH, where there are at least 1 target on each NAND grain, 1 to many grains on each target, several hundred to several thousand physical blocks per grain, several hundred to several thousand physical pages per physical block, and several units per physical page. For the abnormality of the NAND FLASH, the influence degree of various types on the NAND is different, different weights need to be configured, all the abnormalities are counted, all scores are accumulated, and when the abnormality exceeds a threshold (for example, 100 scores), the abnormality of the component is judged. The NAND FLASH has the following exception types, and the rule of weight configuration is as follows:
a) NAND FLASH initialization fails. The entire target cannot be used. The weight is set to maximum (e.g., 100 points).
b) Erase (erase) fails. This physical block is bad and can no longer be used. A larger weight (e.g., 10 points) is set.
c) Programming (program) fails. This physical block is bad and can no longer be used. Set a greater weight (e.g. 10 points)
d) Reread (readretry) fails. Needs to be judged according to actual conditions. In general, a retry failure does not directly eliminate the physical page, and it is possible that the physical page is good, but the retry failure is only under specific conditions, such as data retention (data is retained for a long time), temperature spread (read and write ambient temperatures are very different), and so on. If the readretry failure exhibits a certain regularity, further judgment is needed. If the retry fail is distributed on a specific physical block with a certain rule, it needs to be considered that the physical block is bad. If the physical block is partially on a particular grain according to a certain rule, then the grain is considered bad. The Retry fail counts the number of the particles, the physical blocks and the physical pages respectively, different weights are set at different levels, and the weight is changed along with the change of the level.
In an embodiment of the present invention, in reading the log file, the method further includes: and acquiring the PCB information of the tested bad SSD from the log file. A Printed Circuit Board (PCB) is an important electronic component, which is a support for electronic components and a carrier for electrically interconnecting electronic components. The PCB information in this embodiment mainly includes: version number of PCB. The number and the positions of the components on the PCBs of different versions are different, a PCB library is maintained, and the position information of each component on the PCBs of different versions is recorded. And inquiring the corresponding PCB library according to the PCB version number to obtain the position information of each component of the SSD.
Further, as shown in fig. 2, the method for improving SSD defective analysis efficiency according to the embodiment of the present invention further includes, after determining, as an abnormal component, a physical device whose abnormal score is greater than a corresponding preset score threshold in step S15, the following steps:
and S16, acquiring a device position mapping relation corresponding to the PCB information according to the PCB information, wherein the device position mapping relation comprises position distribution information of each physical device in the bad SSD in the PCB, specifically generating a mapping relation between each device and the position distribution of each device on the PCB, namely a PCB library, according to a schematic diagram corresponding to the PCB, and inquiring the PCB library according to a PCB version number in the PCB information so as to acquire the position information of each component of the bad SSD.
S17, determining the position distribution of the abnormal component in the PCB according to the device position mapping relation.
In this embodiment, the log file is read line by line, and the type of the PCB is obtained. Different PCBs and different positions of devices in the SSD exist. The mapping relation of different device positions needs to be introduced according to different PCBs. Specifically, after the devices are judged to be abnormal components, the mapping relation of the devices on the PCB can be imported through the schematic diagram corresponding to the PCB, and the device numbers on the PCB are output. The maintenance personnel can intuitively find the device needing maintenance on the SSD for repair.
According to the method for improving the analysis efficiency of the defective SSD, the root cause of an abnormal event of the defective SSD can be judged by automatically analyzing the log of the defective SSD aiming at the tested defective SSD, the abnormal component of the defective SSD is automatically positioned, and the specific position of the abnormal component in the defective SSD in the PCB is pointed out, so that the analysis efficiency of the defective SSD is improved, the intervention of firmware research and development personnel and hardware research and development personnel is not needed, and the maintenance personnel can visually find the corresponding component needing to be maintained on the SSD for repair according to the analysis result, so that the maintenance flow of the defective SSD is simplified, the cost of the maintenance flow personnel is reduced, and the production efficiency is effectively improved.
The method for improving the analysis efficiency of the SSD defective products, provided by the embodiment of the invention, has the following beneficial effects:
1. the defective product analysis efficiency is improved, the defective product maintenance process is simplified, the defective product processing time is shortened, and the production efficiency is improved;
2. the intervention of firmware research personnel and hardware research personnel is not needed, and the investment of personnel is reduced.
3. The SSD of various defective models can be automatically analyzed.
4. The SSD production process is automatic and intelligent.
5. Analyzing the root cause of SSD non-good abnormalities is more accurate.
For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those of skill in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the embodiments of the invention.
Fig. 3 schematically shows a structural diagram of an apparatus for improving SSD failure analysis efficiency according to an embodiment of the present invention. Referring to fig. 3, the apparatus for improving SSD defective product analysis efficiency according to the embodiment of the present invention specifically includes an obtaining module 201, an analyzing module 202, a counting module 203, a calculating module 204, and a determining module 205, where:
an obtaining module 201, configured to read a log file, so as to obtain an abnormal event of a tested bad SSD in a test from the log file;
the analysis module 202 is configured to analyze each abnormal event to obtain a physical device corresponding to each abnormal event;
the statistical module 203 is used for dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events;
the calculating module 204 is configured to calculate an abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and a preset score weight of each event type;
the determining module 205 is configured to determine whether the abnormal score of each physical device is greater than the corresponding preset score threshold, and determine the physical device having the abnormal score that is greater than the corresponding preset score threshold as the abnormal component.
In an embodiment of the present invention, the obtaining module 201 is further configured to obtain, from the log file, firmware configuration information of the tested bad SSD in the process of reading the log file;
correspondingly, the parsing module 202 is configured to parse, based on the firmware configuration information of the bad SSD, the position information of the occurrence of the abnormal event in the log file, to obtain the physical device corresponding to each abnormal event.
In an embodiment of the present invention, the obtaining module 201 is further configured to obtain, from the log file, the PCB information of the tested bad SSD in the process of reading the log file;
correspondingly, the determining module 205 is further configured to, after determining that the physical device with the abnormal score larger than the corresponding preset score threshold is an abnormal component, obtain a device position mapping relationship corresponding to the PCB information according to the PCB information, where the device position mapping relationship includes position distribution information of each physical device in the bad SSD in the PCB; and determining the position distribution of the abnormal component in the PCB according to the component position mapping relation.
The abnormal events in the embodiment of the invention comprise: the method comprises the following steps of main controller exception, DDR exception, peripheral device exception and NAND flash memory exception, wherein the NAND flash memory exception comprises the following steps: NAND FLASH initialization fails, erase fails, program fails and reread fails.
The statistical module 203 is used for dividing the main controller exception, the DDR exception and the peripheral device exception into corresponding event types respectively; the NAND FLASH initialization failure, the erasing failure, the programming failure and the re-reading failure are divided into different event types according to the chip target, the grain, the physical block or the physical page of the event.
Further, the statistical module 203 is specifically configured to perform abnormal event statistics on the NAND FLASH initialization failure according to the chip target where the event occurs;
carrying out abnormal event statistics on the erasure failures according to the physical blocks in which the events occur;
carrying out abnormal event statistics on programming failure according to a physical block where an event occurs;
performing abnormal event statistics on rereading failures according to physical pages where events occur, wherein if the physical pages where rereading failures occur are regularly distributed on physical blocks, performing abnormal event statistics on the rereading failures according to the physical pages and the physical blocks where the events occur respectively; and if the physical page with the rereading failure is regularly distributed on the physical block and the physical block with the rereading failure is regularly distributed on the particle, respectively carrying out abnormal event statistics on the rereading failure according to the physical page, the physical block and the particle of the event.
In an alternative embodiment of the present invention, the scoring weights for each event type have the following relationship:
the main controller exception = DDR exception = peripheral device exception = NAND FLASH initialization failure > erase failure = program failure = reread failure occurred at grain > reread failure occurred at physical block > reread failure occurred at physical page.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method as described above.
In this embodiment, if the device for improving SSD failure analysis efficiency is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
In addition, the embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the steps of the method for improving the SSD defective product analysis efficiency are implemented. Such as steps S11 to S15 shown in fig. 1.
According to the method, the device, the storage medium and the equipment for improving the analysis efficiency of the SSD defective products, provided by the invention, aiming at the tested SSD defective products, the root cause of an abnormal event of the SSD defective products can be judged through automatically analyzing the log of the SSD defective products, the abnormal components of the SSD defective products are automatically positioned, the intervention of firmware research personnel and hardware research personnel is not needed, the maintenance personnel can maintain the corresponding components according to the analysis result, the maintenance flow of the defective products is simplified, the cost of the maintenance flow personnel is reduced, and the production efficiency is effectively improved.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Moreover, those of skill in the art will appreciate that while some embodiments herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, any of the claimed embodiments may be used in any combination.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method of improving SSD rejects analysis efficiency, the method comprising:
reading the log file to obtain abnormal events of the tested bad SSD in the test from the log file;
analyzing each abnormal event to obtain a physical device corresponding to each abnormal event;
dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events;
calculating the abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and the preset scoring weight of each event type;
judging whether the abnormal score of each physical device is larger than the corresponding preset score threshold value, and judging the physical device with the abnormal score larger than the corresponding preset score threshold value as an abnormal component;
in reading the log file, the method further comprises:
obtaining the PCB information of the tested bad SSD from the log file;
after the physical devices with abnormal scores larger than the corresponding preset score threshold are judged as abnormal components, acquiring device position mapping relations corresponding to the PCB information according to the PCB information, wherein the device position mapping relations comprise position distribution information of each physical device in the bad SSD in the PCB;
and determining the position distribution of the abnormal component in the PCB according to the component position mapping relation.
2. The method of claim 1, wherein during reading of a log file, the method further comprises:
acquiring the firmware configuration information of the tested bad SSD from the log file;
correspondingly, the analyzing each abnormal event to obtain the physical device corresponding to each abnormal event includes: and analyzing the position information of the abnormal events in the log file based on the firmware configuration information of the bad SSD to obtain the physical devices corresponding to the abnormal events.
3. The method of any of claims 1-2, wherein the exception event comprises: the method comprises the following steps of main controller exception, DDR exception, peripheral device exception and NAND flash memory exception, wherein the NAND flash memory exception comprises the following steps: NAND FLASH initialization failure, erasure failure, programming failure and rereading failure;
the dividing the abnormal events into different event types according to the difference of the physical devices corresponding to the abnormal events comprises:
dividing the abnormality of the main controller, the abnormality of the DDR and the abnormality of the peripheral device into corresponding event types respectively;
the NAND FLASH initialization failure, the erasing failure, the programming failure and the re-reading failure are divided into different event types according to different chip selection areas, particles, physical blocks or physical pages of the event.
4. The method of claim 3, further comprising:
performing abnormal event statistics on the NAND FLASH initialization failure according to the chip target of the event;
carrying out abnormal event statistics on the erasure failures according to the physical blocks in which the events occur;
carrying out abnormal event statistics on programming failure according to a physical block where an event occurs;
and carrying out abnormal event statistics on the rereading failure according to the physical page where the event occurs.
5. The method of claim 4, further comprising:
if the physical page with the rereading failure is regularly distributed on the physical block, respectively carrying out abnormal event statistics on the rereading failure according to the physical page and the physical block with the event occurrence;
and if the physical page with the rereading failure is regularly distributed on the physical block and the physical block with the rereading failure is regularly distributed on the particle, respectively carrying out abnormal event statistics on the rereading failure according to the physical page, the physical block and the particle of the event.
6. The method of claim 5, wherein the scoring weights for each event type have the following relationship:
the main controller exception = DDR exception = peripheral device exception = NAND FLASH initialization failure > erase failure = program failure = reread failure occurred at grain > reread failure occurred at physical block > reread failure occurred at physical page.
7. An apparatus for improving SSD failure analysis efficiency, the apparatus comprising:
the acquisition module is used for reading the log file so as to acquire the abnormal events of the tested bad SSD in the test from the log file;
the analysis module is used for analyzing each abnormal event to obtain a physical device corresponding to each abnormal event;
the statistical module is used for dividing the abnormal events into different event types according to different physical devices corresponding to the abnormal events;
the calculation module is used for calculating the abnormal score of the physical device corresponding to each event type according to the number of the abnormal events in each event type and the preset scoring weight of each event type;
the judging module is used for judging whether the abnormal score of each physical device is larger than the corresponding preset score threshold value, and judging the physical device of which the abnormal score is larger than the corresponding preset score threshold value as an abnormal component;
the acquisition module is used for acquiring the PCB information of the tested bad SSD from the log file in the log file reading process;
the judging module is further configured to, after the physical device with the abnormal score larger than the corresponding preset score threshold is judged as an abnormal device, obtain a device position mapping relationship corresponding to the PCB information according to the PCB information, where the device position mapping relationship includes position distribution information of each physical device in the bad SSD in the PCB; and determining the position distribution of the abnormal component in the PCB according to the component position mapping relation.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1-6 when executing the computer program.
CN202211080666.3A 2022-09-05 2022-09-05 Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency Active CN115171768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211080666.3A CN115171768B (en) 2022-09-05 2022-09-05 Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211080666.3A CN115171768B (en) 2022-09-05 2022-09-05 Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency

Publications (2)

Publication Number Publication Date
CN115171768A CN115171768A (en) 2022-10-11
CN115171768B true CN115171768B (en) 2022-12-02

Family

ID=83481211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211080666.3A Active CN115171768B (en) 2022-09-05 2022-09-05 Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency

Country Status (1)

Country Link
CN (1) CN115171768B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426785A (en) * 2012-05-18 2013-12-04 三星泰科威株式会社 Method and apparatus for tracking mounting error of chip mounter
CN110556155A (en) * 2018-06-04 2019-12-10 记忆科技(深圳)有限公司 Method and device for testing diskless started SSD product and computer equipment
CN111192623A (en) * 2018-11-14 2020-05-22 慧荣科技股份有限公司 Method, computer device and user interface for automated testing
CN114943321A (en) * 2021-02-08 2022-08-26 超聚变数字技术有限公司 Fault prediction method, device and equipment for hard disk

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11210166B1 (en) * 2017-12-22 2021-12-28 Pliops Ltd. Efficient redundancy management in key-value NAND flash storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426785A (en) * 2012-05-18 2013-12-04 三星泰科威株式会社 Method and apparatus for tracking mounting error of chip mounter
CN110556155A (en) * 2018-06-04 2019-12-10 记忆科技(深圳)有限公司 Method and device for testing diskless started SSD product and computer equipment
CN111192623A (en) * 2018-11-14 2020-05-22 慧荣科技股份有限公司 Method, computer device and user interface for automated testing
CN114943321A (en) * 2021-02-08 2022-08-26 超聚变数字技术有限公司 Fault prediction method, device and equipment for hard disk

Also Published As

Publication number Publication date
CN115171768A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
Schroeder et al. Flash reliability in production: The expected and the unexpected
US10310930B2 (en) Solid state disk using method and apparatus
CN109918022B (en) SSD open card bad block table inheritance method
US7992061B2 (en) Method for testing reliability of solid-state storage medium
Schroeder et al. Reliability of NAND-based SSDs: What field studies tell us
WO2017079454A1 (en) Storage error type determination
WO1993010494A1 (en) Method for dynamically measuring computer disk error rates
CN112486415B (en) Garbage collection method and device for storage device
CN111459708B (en) Bad block processing method and device
CN114283873A (en) Flash memory detection method and flash memory detection system
CN116343900A (en) Automatic testing method, system and equipment for solid state disk and readable storage medium
Zhang et al. Multi-view feature-based {SSD} failure prediction: What, when, and why
CN115171768B (en) Method, device, storage medium and equipment for improving SSD (solid State disk) defective product analysis efficiency
CN108564981B (en) Dynamic monitoring method for data security of storage device
CN110764960B (en) Solid state disk firmware testing method
CN114283868A (en) Method and device for testing reliability of flash memory chip, electronic equipment and storage medium
CN114090354A (en) Memory module screening method and testing device
CN112802529A (en) Detection method and device for military-grade Nand flash memory, electronic equipment and storage medium
CN111552582B (en) Method and device for screening weak flash blocks and solid state disk
CN116665748A (en) Automatic test equipment for flash memory chip and test method thereof
US11947819B2 (en) Method and system for testing conversion relationship between block reading and page reading in flash memory chip
CN113568798B (en) Server fault positioning method and device, electronic equipment and storage medium
CN117809725B (en) Flash memory particle screening and grading method
CN111209146A (en) RAID card aging test method and system
CN114974387B (en) Flash memory test method and device based on solid state disk main control chip and solid state disk

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant