Common Application Issues of NAND Flash
Program Loss (specifically referring to program loss on a motherboard using NAND Flash) is a situation that makes engineers uneasy. It’s particularly troublesome when the program just disappears during use, and often, many engineers find it difficult to tackle the issue. When the problem occurs, you might not be around to witness the exact symptoms, and even signal measurements often won’t help pinpoint the cause. At this point, many engineers may resort to replacing the motherboard components and using the process of elimination to locate the problem.
In most cases, replacing the NAND Flash can resolve the issue. At this point, many engineers may believe that the NAND Flash itself is faulty and may suggest switching to another vendor or brand. However, it is worth diving deeper into understanding the underlying causes. Below, we summarize some common scenarios and causes of NAND Flash program loss:

1. Program Errors in NAND Flash Due to Unstable Power Supply Voltage
Often, when a product encounters issues in the hands of a customer, engineers may take the product back, re-flash the program, and find that the product powers up and starts normally, with no program loss occurring after repeated testing. If the product contains a battery, it's worth considering whether the issue occurred due to the battery’s usage in the customer’s environment. When the battery's charge is low or, in extreme cases, if the program has a low threshold for detecting battery voltage, the main controller may start up, but the battery quickly runs out of power. In such cases, the program stored in the NAND Flash may become corrupted, resulting in failure to start normally.
Solution: One approach is to add or increase the battery voltage detection threshold in the program, ensuring that all chips can function properly above this threshold.
2. NAND Flash Program Errors Due to Abnormal DRAM Operation
The main controller, DRAM, and NAND Flash together form the core system of a product. When any component in the system fails, it can cause issues in the entire system. However, when DRAM fails, the result may appear as NAND Flash errors, program loss, or an excessive number of NAND Flash bad blocks (also known as Bad Blocks). This situation is more complex to handle.
If re-flashing the program allows the system to work normally again, it indicates that the program stored in NAND Flash was indeed corrupted, but the NAND Flash itself is functioning correctly. This is especially true for SLC NAND Flash with 1-bit ECC, where the failure probability is very low. For NAND Flash, data corruption typically occurs when writing or erasing data, as bad blocks can be created. When the program is simply read, it doesn't involve any charge changes, and generally, there will be no issues.
If there are many bad blocks detected when reading the program from the debugging port, it’s worth using more advanced debugging tools, such as JTAG tools, to conduct deeper debugging on the NAND Flash. Sometimes, due to erroneous program operations, a Good Block flag may mistakenly be marked as a Bad Block. By using JTAG debugging tools, it's possible to modify the flag and re-mark the block. After re-flashing the program, the motherboard may boot up again.
3. Inadequate Bad Block Management
Since NAND Flash is prone to bad blocks, effective bad block management is necessary. The specification document outlines three situations in which bad blocks may occur: during programming, during erasure, and during reading.
For these scenarios, the specification also provides flowcharts to guide the appropriate program actions.
During Programming and Erasure: If a failure occurs, the program should transfer the target block to a new block and mark the current block as bad in the bad block table. Maintaining this bad block table helps prevent writing data to bad blocks.
During Reading: ECC (Error Correction Code) should be used to verify the program’s integrity to ensure that the data read is accurate and reliable. However, many engineers may not handle bad block management carefully enough, leading to program loss issues in real-world use. Engineers should focus on the software side and analyze whether the bad block management system is complete.
Conclusion
The three scenarios outlined above represent some common causes of program loss. These issues are particularly applicable to 1-bit ECC SLC NAND Flash and can serve as references for engineers working with NAND Flash. Engineers should carefully analyze and address these causes to ensure stable system performance.