[ale] ZFS on Linux

Brian MacLeod nym.bnm at gmail.com
Tue Apr 2 11:26:47 EDT 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 4/2/13 10:47 AM, Derek Atkins wrote:
> 
> 
> I wonder if this means you should spread your disks across multiple
> controllers?  For example let's say you have three controllers in
> your system, would it be better to put two drives from each array
> on each controller?  That way if a single controller (or cable)
> goes bad you don't lose your array.


You absolutely can do this.

Just be mindful that performance on each controller be near identical,
else you risk making one controller the bounding restriction on the
speed of rebuild.

You are in one sense, describing the design of our backup
infrastructure behind the example file server I gave as well as our
test Coraid storage. :-)


> I wouldn't consider this a punishment, per se.  Any error 
> correction by definition requires space.  In this configuration
> you have 6 drives in a raidz2 so you're only "losing" 33% due to 
> overhead.  IMHO that's not too bad, and is better than taking
> those 6 drives and forming a raid-10 out of them, because then you
> lose 50% to overhead.  So you get 8TB per vdev instead of only 6.


That's why "punishment" was in quotes :-)  Because of the monies and
computation time involved with the data I am managing at work,
reliability is on this part is more key than capacity and speed.


> Are you sure about that?  I did some research and according to 
> http://forums.overclockers.com.au/showthread.php?t=961125 I should 
> be able to expand the space in the vdev once all the disks have 
> been upgraded.  Apparently there is a zpool feature called 
> "autoexpand" that lets you do that, once you've scrubbed.  (I'm
> not 100% sure what a scrub does).


Scrubs are a process to verify that bit flipping hasn't occurred and
the media is still reliable.  It is usually good practice to have this
regularly scheduled.  Still working on that because there is a
performance impact.

It was a later-added feature of ZFS and we've chosen to avoid it by
buying larger chassis anticipating costs of drives (per TB) to drop,
and letting the participants in the HPC program to buy in as they need
to.  Thus, I'm not as well versed on it, but thank you for bringing it
to my attention as this may actually solve an issue we are coming upon.


> Define "resize" here?  By "cannot resize" do you mean that if you 
> have a 6-disk raidz2 you cannot restructure it into an 8-disk 
> raidz2, or a 9-disk raidz3?


Yes, my mistake for overloading the term here.  You are correct in
your read: a vdev originally designed as 6 devices cannot later be
made into 7, 8, etc.  It has to be destroyed.


> This is probably due to the number of drives you need to hit to 
> recover a block of data, or something like that.  On the system
> I'm currently designing (based on a NORCO 4224 case) it looks like 
> 6-drive raidz2 vdevs would fit nicely.


Yes, 4 units with no hot spare.  Our design of similarly capable
hardware would tend to put us in the more paranoid 7 drive vdev (x3)
with 3 hot spares.


> What about rebalancing usage?  Let's say, for example, that I
> start with one raidz2 vdev in the zpool.  Now a bit later I'm using
> 80% of that space and want to expand my pool, so I get more drives
> and build a second raidz2 vdev and add it to the zpool.  Can I get
> zfs to rebalance its usage such that the first and second vdevs
> are each using 40%?  I'm thinking about this for spindle and
> controller load balancing on data reads.


That's actually what happens with our buy-in model.  As data gets
written and such ZFS will rebalance the usage between the vdevs, so I
can attest this works as you might expect.


> Thanks!


You're welcome.  Now I have some additional experimenting to do...


Brian


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQE4BAEBCAAiBQJRWvi3Gxhoa3A6Ly9rZXlzZXJ2ZXIudWJ1bnR1LmNvbQAKCRD5
XCJY/q4Y6LJvB/0Wg/LRRSjnNv8P+czKIJ0wyLCc7xH4vJQtHMRtNlEUT84EYn9t
Xm6QoVxk1FZMxv83cyeqybVTObVxHSPJPbeUdH2ryDHeJJEo9ak4iq7cswrVHW7h
cb22O83QyBUgAM+e6cR9fJfKavNscfe0YXsIk3N7M0XxYqfVEOrDygk715gmFHcd
++CNLi5iwrn3w1VEQSWtrAvHqCvpnitCCAd1uZDM90uNmFxLgHmJiTRzIxX9FQLz
ruKKWt+q1pzHD8Q9OeNk8d7JIy+MATCKSohTq36sjxymbS/PS/AzavMmOcThtbaI
NyHUWTP+K1RFx9tXxKvAiwy9Zaa1ioela5e2
=GBfd
-----END PGP SIGNATURE-----


More information about the Ale mailing list