Mystery Writes

We recently observed constant write activity on our development server, while watching dstat. With the help of iotop, we identified the Apache web server was the culprit. But why would apache be doing so many writes? That's not normal behavior (excluding the the logs). We then used auditd to log writes by apache.
$ sudo auditctl -a exit,always -S write -F uid=33
We also logged opens that were not O_RDONLY:
$ sudo auditctl -a exit,always -S open -F uid=33 -F a1'!=0'`
This resulted in the following rules:
$ sudo auditctl -l LIST_RULES: exit,always uid=33 (0x21) syscall=write LIST_RULES: exit,always uid=33 (0x21) a1!=0 syscall=open
This allowed us to isolate a PHP module that was completely broken (using aureport).

Man Pages

  • man 8 auditctl - a utility to assist controlling the kernel’s audit system
  • man 8 aureport - a tool that produces summary reports of audit daemon logs
  • man 1 dstat - versatile tool for generating system resource statistics
  • man 1 iotop - simple top-like I/O monitor
  • man 2 open - open and possibly create a file or device


The Automation of Networks of Networks

For me, at least, systems administration is a difficult field to define and talk about--especially succinctly. A couple of excellent blog posts suggest the following definition for Systems Administration: the automation of networks of networks.

Therefore, SREs spend half their time writing code to eliminate what they do the other half of their day. When they "automate themselves out of a job" it is cause for celebration and they get to pick a new project. There are always more projects. (Everything Sysadmin: Has the job of a Google SRE changed over the years?)
Networks of Networks:
Findings so far suggest that networks of networks pose risks of catastrophic danger that can exceed the risks in isolated systems. A seemingly benign disruption can generate rippling negative effects. Those effects can cost millions of dollars, or even billions, when stock markets crash, half of India loses power or an Icelandic volcano spews ash into the sky, shutting down air travel and overwhelming hotels and rental car companies. In other cases, failure within a network of networks can mean the difference between a minor disease outbreak or a pandemic, a foiled terrorist attack or one that kills thousands of people. (When Networks Network - Science News via Schneier on Security: The Insecurity of Networks)

The concept of networks of networks also servers to differentiate system administrators from programmers (not 100% accurate, but helpful none-the-less).


Isolating SSD Firmware Issue

On Friday 2012-09-28, my MacBook Pro started freezing. There had been no significant software updates that morning (though 10.8.2 came out that week).

I didn't want to deal with it, so I took it to the Apple Store. Unfortunately, they simply wasted a few days (given the solution found, I'm confident they never let it run more than an hour).

Once I got it back from the Apple Store without resolution, I knew it would be up to me to isolate the issue. I was inspired by Everything Sysadmin: What makes a sysadmin a "senior sysadmin"? to proceed methodically. It paid off:
Correct a condition where an incorrect response to a SMART counter will cause the m4 drive to become unresponsive after 5184 hours [216 days] of Power-on time. The drive will recover after a power cycle, however, this failure will repeat once per hour after reaching this point. The condition will allow the end user to successfully update firmware, and poses no risk to user or system data stored on the drive. (M4 firmware 0309 is now available - Crucial Community; found via j03l: MAC Freezes After An Hour - Crucial SSD Problem)
Since the installation of firmware 010G, my MacBook Pro has operated without issue!

Hopefully, this post will be another signpost for those affected.

Also, I can't overemphasis this post: Everything Sysadmin: What makes a sysadmin a "senior sysadmin"?.

Linux TCP/IP Tuning

There is a lot of interesting information in the article and related comments: Linux TCP/IP Tuning | Hacker News. I look forward to eventually having systems that benefit from that granularity of tuning.