Systems Performance

Mainly contains data from Brendan Greggs

Blog
Books
- System Performance Enterprise and Cloud
Podcast
- Se-Radio-Episode-255

Anything in the data path, software, hardware, is included, as it can affect performance. For distributed systems you should have a diagram of your environment showing data paths.

Methodologies

A methodology is a systematic way of approaching a performance problem.

Workload characterization method

USE method

Anit methodologies

Random change

Blame someone else

Tipps

80/20 Rules

For all problems not related to latency (outlyers): 80% of the problems are in the app 20% are in everything else
For latency outlyers: 20% of the problems are in the app and 80% of the problems are in everything else

https://youtu.be/FJW8nGV4jxY?t=2436

CPUs: kernel can be gracefull, can take low prio work of CPU
Disks: kernel has not so much control, work which is queued will usually need to be handled in that order, irelevant what the prio is. usually 60% utilization is already problematic
strace - super slow! be very carefull
tcpdump - overhead significant https://youtu.be/FJW8nGV4jxY?t=2691, is not scaleable
pidstat -t 1
pidstat -d 1 -> block device usage
lsof -iTCP -sTCP:ESTABLISHED -> file descriptors
sar -n TCP,ETCP,DEV 1
ss -map -> socket statistics
iptraf -> histograms packet size
iotop -> readable io stats
slabtop -> kernel slab allocator memory usage
pcstat -> page cache
tiptop -> IPC by process, very low level, IPC .. instructions per cycle
Flame graphs: https://www.youtube.com/watch?v=BHA65BqlqSk
ip route get ipAddr -> give ip address and it tells where it goes

Tools

top
htop
atop
dstat
vmstat
iostat
netstat
ifstat

Benchmarking

Results are often misleading: you benchmark A, but actually stress B and conclude you measured C

Mistakes:

testing the wrong target: FS disk cache instead of actual disk
choosing wrong target: testing disk instead of FS cache (actuall apps should hit FS cache more often than disk, does not reassemble real app behaviour)

Active benchmarking - analyze them while running

lmbench -> cache, hardware analysis
fio --name=seqwrite --rw=write --bs=128k --size=5000m

static tools

check static state of a system witout any workload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

system.md

system.md

Systems Performance

Methodologies

Workload characterization method

USE method

Anit methodologies

Random change

Blame someone else

Tipps

Tools

Benchmarking

Files

system.md

Latest commit

History

system.md

File metadata and controls

Systems Performance

Methodologies

Workload characterization method

USE method

Anit methodologies

Random change

Blame someone else

Tipps

Tools

Benchmarking