Skip to content

ZFS slog (ZIL) considerations

Running List of Alternatives

2015-08-11

It looks from this Anandtech article that the m550 does not have as strong power-loss protection as I originally thought. It protects the “lower pages” (least-significant bits) from corruption between page writes. But, it does not protect the DRAM buffer from losing data.

2014-07-07

The Transcend MTS600 line is capable of 310 MB/s sequential write. It is ideal as an SLOG, as it is also available in a 32GB size. The downside is that it is an M.2 (NGFF) format. There is also an MTS400 line, which is only capable of 160 MB/s, and an MTS800 line, which has the same 310MB/s performance as the MTS600.

2014-05-10

The Seagate PRO models have power-loss protection, and they seem to do very well on sequential read/writes–better than the Crucial m500.

The new ADATA SP920 are the same as the Crucial m550 (except the NAND configuration is like the Crucial m500), which all have power-loss protection. The ADATA performs considerably better than the (older generation) Crucial m500.

And of course, the Crucial m550 itself is faster than the ADATA SP920 due to more parallel configuration of NAND.

Update on 2014-05-09

This rundown of SSD’s indicates that the power loss protection must last seconds, which invalidates the use of a cable to do this. Also, it’s not just the data in transit, it’s the mapping tables that need to be protected. (In fact, it seems that you can brick your SSD if you cut power to it—and that might explain why one of my SSD’s don’t work.)

Original Post

My latest obsession is what drive is best for a SLOG (dedicated ZIL device).

I plan on testing tonight whether sequential writes or random writes are more important. This DDRDrive marketing propaganda states that (at least for multiple producers) random IOPS are more important. But, I’m very unlikely to see multiple synchronous writes to my ZFS pool. (Maybe a ports build or something, but that’s unlikely).

I have numerous options. But, lets start with the power-loss protection. I found this cable online that has (on each of the 5V and 12V lines) a 2.2 mF capacitor. For those of you that don’t know, that’s really big as far as capacitors go. This should let the SLOG device maintain a charge for quite a bit longer during a power failure.

For example, the SanDisk Extreme II 120GB lists 3.4 W as its power consumption during write. If I assume that all of that comes from the 5.0 V rail (which is the conservative calculation), I get 0.68 A (aka 680 mA) as the current draw during write.

If I then assume that a +/- 10% voltage fluctuation is all that can be tolerated, the capacitor will maintain a voltage for:

ΔT = C*ΔV/I = 2.2 mF × 0.5V / 680 mA = 1.6 ms

So, the cable maintains a voltage for 1.6 ms. This does not seem like much, but if I assume that at most my producer (Samba) can generate 1 Gbps (or 125 MB/s), I can get 125MB/s * 1.6ms = 200 kB to disk. (Of course I’d be lucky to get 100 MB/s with the stock NIC cards in any of the machines on my LAN, but that’s a different story.)

But that’s not really the point. The point is that the SSD should remain active after the producer (the Samba server in this case) dies. This allows writes in progress to be completed. In that case, the real amount of data is how quickly the SSD can commit to disk. That, of course, depends on the write speed of the SSD.

I’ve looked at 4 main candidates for an SSD:

  • OCZ Vertex 2 (60 GB)
  • Crucial m500 (120 GB)
  • Sandisk Ultra II (120 GB)
  • Kingston V300 (120GB)

The benefits of each are as follows. Note that I’ve picked the smallest capacity available for each architecture.

OCZ Vertex 2

The main benefit is price. This thing can be had on eBay for $40. It’s also speedy for its generation. Its random write (4k) is pretty high at 65 MB/s. Its sequential write is 225 MB/s (this is probably compressed data; see below on the 120 GB being 146 MB/s). This means that it can keep up with the 125 MB/s Ethernet and a fair margin above it. For this guy, I probably need the capacitor cable.

Crucial m500

There are two main benefits here. First is speed. Both its sequential write and random write are really high, at 141 MB/s and 121 MB/s, respectively. The second is that this drive has power-loss protection. So, I don’t need a clunky cable with built-in capacitors. Whatever data has made it to the drive will get committed.

SanDisk Ultra II

Once again, speed is a huge benefit. This blows away anything else in both sequential (339 MB/s) and random (133 MB/s) writes. In addition, SanDisk also has a unique nCache non-volatile cache. This allows data to hit SLC (or really MLC operating in SLC mode) before it hits the MLC. This approximates the best of both worlds: the speed and (temporal) reliability of SLC, with the cheaper/denser MLC (for permanent storage).

It does, however, have DDR cache. I hope that if the controller issues a flush, this DDR gets flushed (at least to the nCache). But, to be safe, I’ll probably buy the capacitor cable to pair with this, if I go this route.

Kingston V300 (120GB)

This doesn’t have the fancy flash footwork of the SanDisk nCache, nor does it have power-loss protection of the m500. However, it tends to be cheaper. And the speed is quite sufficient at 133 MB/s random and 172 MB/s sequential. It’s basically an older generation SSD.

(Lack of) Conclusion

So, which will I go for? Well, before I make a decision, I want to test whether sequential or random write is the key metric. I plan on doing this tonight. I have a 32GB Kingston SV100 drive as SLOG on the system. I’ll use dd or something to issue synchronous writes of various sizes. The 32GB Kingston has a sequential write speed of 68 MB/s, but a dismal random write speed. If sequential writes are the key metric, I should see a throughput of around 68 MB/s. If not, I should see something much lower.

That’s the hypothesis, anyway. I’ll test it tonight.

Thoughts on DDRDrive

Warning: this section is mere conjecture.

The DDRDrive propaganda is interesting. They didn’t get a good result with just one producer, so they pushed it to multiple producers. To be fair, this is probably a good model of an enterprise situation–a database server, for example. But, their scenario is with multiple file systems on the same server. Is it likely to split up the database between multiple file systems? I’d posit than in an enterprise environment, one would just split up the multiple databases between multiple servers (or virtual servers) and be done.

In addition, they show the write speeds of their measured disks dropping off after some time. This is probably because the cache is empty to begin with, but further writes need to re-write over data (erase + write) rather than a simple write. However, this situation is unlikely in newer versions of FreeBSD, as the ZFS now supports TRIM. This should make freed sectors erased, so that the erase+write cycle does not occur.

Rejected Options

The Vertex 2 also has 120 GB, but it’s speed isn’t much faster (73 MB/s random and 146 MB/s sequential). I’d take it if were as cheap or cheaper than the 60GB.

The OCZ Agility 3 was also an option, but it isn’t any faster with non-compressible data.

 

Be the first to like.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*