Interlude – System Repair

Back in November 2022, I wrote about a problem I had with my system when the wind took out the power at my house. I wrote about one problem I had on reinstall, and how I resolved it.

Cut to a year later, and a similar occurrence, but with a happier ending.

Preventative Maintenance

Back in December 2023, the power went out at my house one morning. I was working on my machine at the time, so I had to stop. We were also baking holiday cookies, and I thought the power outage would pose a much bigger problem for that endeavour (spoiler alert: it didn’t).

I took a look outside at my neighbor’s home — their lights were still on. Curious — we are both on the same grid, so we should both be out. I reasoned therefore that a limb had taken out just my power, so I went outside to inspect my property for damage.

Imagine my surprise when I saw a crew from my power company working on the transformer which supplies my house (and only my house). I asked what had happened, and was informed that they were doing "scheduled maintenance", replacing older equipment to avoid catastrophes later.

In other words, they knew the power was going out, and didn’t tell me. One person’s "scheduled" is another person’s "surprise".

I told them what I did for a living, and that their "scheduled maintenance" interrupted my work (and our baking), and asked for an ETA when the power would return. They gave me a one hour estimate, which they beat, and life went on.

However, the fun for me and my machine was just beginning. What started as a minor irritation turned into a shit show.

Broken

Remembering the last time this happened, I expected some issues with my machine, so I wasn’t surprised when I did have some when I rebooted. These problems, however, were different than last time.

I run EndeavourOS, an Arch variant, with the KDE desktop environment. On reboot, KDE was having issues showing different windows. Some programs would hang, and when they did, the entire DE hung. I couldn’t even reboot the machine — I had to power it off completely at the switch.

I booted into my BIOS settings and to perform diagnostics — nothing major. So I booted to a live CD I keep around, after updating it on my laptop (which also runs EndeavourOS), and did a file system check on the unmounted disks on my system. And that’s where the fun started.

To run these, I loaded a partition manager so I didn’t miss any, and the partition holding my system showed multiple errors. I tried fixing them, but the damage was done — even after fixing everything, the system wouldn’t boot and run properly.

The unscheduled power drop damaged the file system, and a reinstall was needed.

Back That Thing Up

Luckily, a few months earlier I made three investments — two physical, one virtual, and all of a digital nature.

I purchased two different USB-C connected drives, one at 2Tb and the other at 5Tb. The 5Tb model (a Western Digital My Passport) is dedicated to backing up my two machines as well as my wife’s Windows machine. I settled on a tool called Restic to manage my backups — it worked when others gave me problems.

Since I couldn’t run my main OS, doing a full backup was out of the question. That’s where the 2Tb Seagate Portable drive came in handy — I copied the stuff from my Documents, git, and .config folders to it so I wouldn’t lose anything I had since the last full backup.

Once I was sure I was OK, I started a full clean install of EndeavourOS, which took less time than it would have taken my power company to walk to my door and tell me they were going to kill the power.

Of course, getting all the software reinstalled and configured again took a while, but this is where my third investment paid off. When I switched to EndeavourOS, I had to reinstall and reconfigure everything as well. As I did so, I noted all the software I had installed, as well as all the steps to configure things like Plex, MySQL running in a Docker container, RClone for GDrive access, and other things. I use Obsidian for notes, research, and my daily journal, and it’s backed up to a private GitHub repo and duplicated on at least two machines, so I always have it.

So while it took a while, I didn’t need to reinvent anything.

SSDD (Same Shit, Different Day)

Cut to a month later — I hear a loud bang outside just as the power cut off to the neighborhood. This time, it wasn’t the power company turning me off unexpectedly, it was a real transformer ‘splosion. I reported it to the power company and settled in for the wait…

…which wasn’t as long as I thought. An hour or so, and we were back in business. However this time I wasn’t taking chances w.ith my machine.

I booted first to the BIOS to do a complete system check, then to the Live CD image without first booting to the main OS. I did a full fsck -f on every file system that was available to me, including the two USB connected drives. After verifying everything there was good and there were no reported issues, I rebooted to my OS.

I’m happy to report everything worked this time — no issues with updates, no problems booting, no hangs. It’s like it never even happened.

Conclusion

I don’t know if booting to the live CD did any good — I probably just got lucky.

However, I do have one good conclusion — get a good backup solution and use it on all your hardware. Without it, you may be up the creek. It’s cheap compared with the cost of losing all the data on your machine.

And when you finally ditch your old OS for the new hotness, you can get the data back.