So, I’ll spare the gory details (you can read about some of them at the bosses blog, here and here), but we at work had the need to shift a pretty sum of data (80TB or so) as fast as possible to recover from a hardware failure that impacted one particular group. Oh, and the 80TB is millions upon millions of tiny files.
Now, we were using rsnapshot, so there was no “restore” to be done to the files, we could access them straight away, and even exported them read-only while we came up with a plan so users at least had access.
Once the plan was formed, and destination storage quickly provisioned (hello smaller ext4 LVM volumes instead of monolithic xfs), came the task of copying the data over, and fast.
Rsync is always a friend and a bit of a good standby for moving data around, especially for the situations when you want to preserve everything about the files and have the ability top stop and restart transfers. Obviously we were not going to do this over ssh (ouch), so the target storage areas were mounted on the source systems, and rsyncs started running locally, a la:
We had 3, 16 Core, 48GB source systems, and 5 8 core 24GB destination systems, so we spun up a few rsyncs per source system and let them roll overnight.
This morning, not nearly as much data had moved as we had hoped.
Why? Because rsync needs to do fstats on each file, on both ends as it rolls along. Big files flew over the wire, but, most of what we were moving were tiny, tiny files, and as I said, millions and millions of them.
So, we did:
Background that, and crank up a few others on the other sections, and things started going MUCH faster. We wanted a little faster, so:
Boom. Far, far fewer fstats. Access times not being dealt with (we don’t care at this point), so fewer fstats again.