wiki.ipfire.org

The community-maintained documentation platform of IPFire

User Tools

Site Tools


en:hardware:mythbusters:bad-flash

Bad Flash

A common comment on the IPFire Forum is that IPFire “killed the flash” or just “doesn't boot any more after installing a Core Update”. Often this is related to “bad flash”. This article tries to identify what that is and how to avoid using it.

Definition

Bad flash is flash that just fails. Data cannot be written to one or more cells or cannot be read from one or more cells any more. The data in this cell is considered to be destroyed and can most likely will cause file system corruption. This will cause corrupted files or might render the entire filesystem unreadable.

Bad flash does not have any controller that does prevent to reuse the same cells over and over again. Although this is an industry standard for many years now, many cheap devices do not do Wear_leveling or employ any other similar technique. This is what bad flash is.

Bad flash is commonly found in almost all Compact Flash cards, some SD cards and a few SSDs.

Causes

A block on a flash chip has a limited lifetime. It can only be rewritten for a number of times. After that data cannot be stored reliably. This is not only limited to flash storage, but storage in general. Even harddisks age over time.

Some data in a filesystem is constantly changing. This is log files that are written to or databases. Also the filesystem itself is keeping metadata of all files and directories in an index. Every time a new file is created, a file is changed or deleted, this index has to be updated. Some filesystems use a journal.

The filesystem index and journal (if exists) are usually allocated to a number of blocks at the end or beginning of the storage device. This does not change much over the lifetime of a filesystem and causes that these blocks are more often rewritten than most of the blocks that actually contain the content of the files.

Other special types of files do the same. Round Robin Database files (RRDs) are used in IPFire to store analytics of the system. The information is collected regularly in intervals of one minute. That causes the RRD files to be updated once a minute. Luckily this is done in a very efficient way that works like a ring buffer. Only the oldest information is overwritten and then a pointer changed in the header of the file. On the filesystem layer this will break down into only two blocks being changed and the index being updated. The header however overwrites the same block over and over again.

Do the maths and you will find out that this will result in a very short lifetime of the entire flash device if the controller naively overwrites the same block over and over again.

Solutions

Intelligent devices use algorithms to distribute write operations across the entire device so that all cells are evenly used. This is called Wear_leveling. However, cheap SSDs, almost all Compact Flash cards and many SD cards don't support this. A controller costs money and is not needed for some applications like storing photos of a camera.

Running an operating system on such a device that has been designed for something completely different is a recipe for hardware failure.

So the only solution is to buy “good flash”. There are some ways to mitigate problems with bad flash and to extend the lifetime, but for a reliable firewall system reliable storage is needed.

How to find "good flash"?

This is actually not that difficult. Every device that costs a reasonable amount of money should be okay to be used. The recommendation is to always go for a Solid State Disk. These are the most intelligent devices and reliable enough to run a reliable firewall. They are actually more reliable than a harddrive now.

Compact Flash and SD cards

These are very often not designed to be used 24/7 or even close to that. They are cheap. They are exchangeable. If that breaks in your camera you will just replace it because they are inexpensive.

If you really must use them (and you shouldn't) check that they have the “industrial” label on them which usually has flash cells with a longer life expectancy and a good controller inside. These will cost a huge amount more, but there is no way around it if you care about your data.

Mitigation

In IPFire we try to avoid all unnecessary write operations for many reasons like higher performance, less power consumption and some more. There is no manual tuning required.

However, certain things need to be logged and stored. A modern system is collecting a huge amount of statistical data that allows troubleshooting and monitoring the health of the system and the entire network. The IPFire developers can't and won't cut down on that.

TRIM

IPFire supports the TRIM command on SSDs which helps the controller to find unused blocks.

List of known devices

This is an incomplete list of devices that are known to cause issues:

SuperSSpeed S238G

This device is often shipped with the PC Engines APU devices and very faulty. PC Engines has issued a warning on their website that the device randomly drops blocks when TRIM is used. The controller sends to the operating system that it supports TRIM, but actually causes data-loss.

IPFire 2.17 - Core Update 93 is shipping a fix which will always disable TRIM for this device. There have been more reports that this device is failing despite of that.

Translations of this page?:
en/hardware/mythbusters/bad-flash.txt · Last modified: 2016/03/10 03:49 by MichaelTremer