Linux Observability
When something breaks — a service won’t start, CPU is at 100%, a container is running out of memory — these tools are how you find out what the system is actually doing.
Logging
journalctl — The interface to systemd’s journal. Filtering by unit (-u nginx), boot ID (-b), priority (-p err), and time (--since "1 hour ago"). JSON output for scripting. Boot-time analysis with --list-boots. journald.conf settings: Storage=, Compress=, SystemMaxUse=, ForwardToSyslog=. Vacuuming old logs with --vacuum-time and --vacuum-size.
Log Management — The /var/log/ directory structure: syslog/messages (all syslog events), auth.log/secure (authentication), kern.log (kernel), daemon.log (background services), dmesg (boot-time kernel ring buffer). Binary logs: last, lastlog, faillog, btmp — and why they’re binary. logrotate configuration: daily/weekly/monthly rotation, compression, notifempty, missingok, the postrotate script for signaling services to reload their logs.
Monitoring
Monitoring Tools — Live and historical system metrics.
top/htop— per-process CPU and memory, sortable, signal-sendable, reniceablevmstat— overall system view: run queue length, swap in/out, block I/O, CPU breakdown (us/sy/id/wa)iostat— per-disk I/O: reads/writes per second, kilobytes per second, await time, utilization %sar— historical metrics from/var/log/sa/(sysstat).sar -q(queue length),sar -r(memory),sar -n DEV(network),sar -b(I/O). Useful for explaining to a manager why the server was slow three days agopidstat— per-process I/O and CPU statsdstat— combined vmstat/iostat/netstat with color outputioping— disk latency testing
Tracing and Debugging
strace — The syscall tracer. Attach to a running process (-p PID) or start a command under strace (strace -f -e trace=network,file cmd). Reading the output: every line is a syscall with its arguments and return value. -o to write to file, -c for a summary count of which syscalls were called. Using strace to debug why a command fails, what files it opens, what network connections it makes.