Postmortem of APFS Data Loss

Has anyone seen anything like this sort of data loss before? It’s the most baffling thing I’ve ever seen.

Here’s the TL;DR version: - System hangs and I reboot - 4 Sass/CSS files I had open and working on are now totally blank - Time Machine also has blanks

And here’s the long version: Last night while working on Sass/CSS for Stacks 4 my system fans kicked on for no apparent reason. Then my network connection dropped – even though my phone (on the same network) still showed a solid fast connection. I quit my running applications (Chrome, Coda, Sublime, and Atom) and checked Activity Monitor. Nothing eating much CPU or memory. But system had been running over a week – so ~8GB was MIA seemingly leaked from the macOS kernel. Network showed zero traffic. I decided to try a log-out and back in again. I logged out. I typed in my password to log in, pressed return, and the system hung. It stayed there in the “logging in” state. I went to get a drink from the fridge, checked my email on my phone, and 15 minutes later the system was still stuck. Fans still running a bit too high. I rebooted (holding the power button for a few seconds) and the system booted up normally. It presented me with the usual “Are you sure you want to re-open these apps?” dialog that you see after a hard-boot. I clicked the Cancel button. I’m not 100% certain whether that means “yes, open them again” or “no, don’t open them again”. But I’m pretty sure I clicked Cancel. I put on my headphones and prepared to get back to work but opened up the files I had been working on and Atom just showed as just a blank screen. I thought Atom was have a problem at first. But then I jumped to the CLI and did: git status to find scary thing number one: the four CSS files I had been working on for the past few hours all showed changes. All four showed zero bytes. Empty files. 😱 I tweeted about it with the knowledge in the back of my mind that I had a checkin a few hours prior and TimeMachine runs very reliably on my system – saving data to a server in a different room in the house. So the damage was at least limited. After tweeting about my annoyance, because that’s basically what twitter is for now, I opened up TimeMachine to find that there had been nearly hourly backups all day. The last backup was right around the time of the crash. 🤔 But after looking through the backups I was hit with scary thing number two: the backups had blank files too!!! For the past several hours Time Machine had dutifully made a backup of these 4 files – and it shows they were all blank. What!?!!? How!?!?! At this point I seriously began to question my sanity. I haven’t been sleeping too well recently and actually had a walk around the room and a look in the mirror to make sure I wasn’t really losing it. Next Icarefully looked through all the other files in the directory. Git showed only these four files changed – but there are lots of interim files in the directory not managed by git. In this case these files are actually written in Sass and “transpiled” to CSS. The process is two steps of “compiling” then “compressing” (in quotes because it is neither really compilation nor really compression. LOL – web dev “engineering” 😏) These in-between files DO exist and show that the things I was doing were not part of some sleep-deprived delirium. And looking through Time Machine shows that they made changes at each backup too. I was able to mostly reconstruct my work from a combination of the last git commit and the interim files. It took about an hour to carefully piece it back together – but many hours of solid work were saved. Phew.

I don’t know what happened, but I think there are some interesting bits we can glean about what didn’t happen. - it wasn’t a specific app. the command line tool processed these files (many many many times) – as well as editing the files in multiple text editors (don’t @ me – it works for me) - it wasn’t a matter of “not saving”. for the command line programs to run the files have to be saved. - it wasn’t something that happened at the last second at the crash. Time Machine backed up these files several times – each time backup up a zero-byte empty file. and at each of those points the same files offspring, the compiled versions, were backed up with full content and showed the expected changes.

Here’s some of the details of my system since a couple people have asked: It’s a Mac Pro 2012 running macOS 10.13.4. The drive is a 512GB Samsung SSD on my PCIe bus (IOW: it’s a card, not a disk drive looking thing) formatted with APFS with about 100GB of free space. I’ve had the drive for a couple years. I haven’t experienced any other problems with it. The APFS upgrade went smoothly last September. I don’t have any abnormal partitions. No beta software. No other stuff. I don’t run many utilities or extras. I have iStat Menus, a Logitech Mouse driver, and Karabiner (keyboard remapper for using non-standard mechanical keyboards). Prior to the hang I had been doing a lot of development. And very varied development. I started the day doing Stacks 3 release builds so was working mostly in Xcode. I switched to working on front-end web-design stuff for Stacks 4. So Xcode was open, but no longer my primary tool. I was working mostly in Sublime, Atom, and Coda (don’t ask – this setup works for me) and running a makefile builds script to do the transpile.

Here are the tweets, you can watch my horror unfold as I discovered it last night:

first discovery: twitter.com/isaiah/st…

time machine corrupted too: twitter.com/isaiah/st…

intermediate transpiled files OK: twitter.com/isaiah/st…

more details: twitter.com/isaiah/st…

isaiah @isaiah