So I finished Freakonomics in just over a day. This is pretty impressive for me since , most of the time I fall asleep before I've finished the second page of any book I try to read. Anyhow i was surprised how much I enjoyed the book, and how applicable it was to what we do as engineers every day; which is to sift through lots of data looking for the clue that will give the answer to the problem at hand. Often we're looking at the same data that other engineers have seen and try to interpret the data in a different way, or indeed look at data which has previously been overlooked. In the case of Freakonimics the data is not code, errors or performance statistics, but nor is it about 'monetary' data. It's Sumo wrestlers win/lose ratios against same opponents (looking for corruption in Sumo) and other stuff too like 'why do drug dealers still live with their mums?'. Entertaining, and well worth a read.

Creating snapshots using passwordless ssh

These instructions show how to setup a user so that he can use ssh to create snapshots on a filer. The user can ONLY carry out snapshot operations, but cannot login or run other commands (even a simple 'version') will fail. This sort of thing might be useful if you were writing a script, which could run under this username.

Roles and RBAC on NetApp filers
First setup ssh, and make sure root can get to the filer via ssh.

(1) On the filer, setup ssh, accept the defaults....

filer2> secureadmin setup ssh

SSH Setup
Determining if SSH Setup has already been done

SSH server supports both ssh1.x and ssh2.0 protocols.

SSH server needs two RSA keys to support ssh1.x protocol. The host key is
generated and saved to file /etc/sshd/ssh_host_key during setup. The server
key is re-generated every hour when SSH server is running.

SSH server needs a RSA host key and a DSA host key to support ssh2.0 protocol.
The host keys are generated and saved to /etc/sshd/ssh_host_rsa_key and
/etc/sshd/ssh_host_dsa_key files respectively during setup.

SSH Setup will now ask you for the sizes of the host and server keys.
For ssh1.0 protocol, key sizes must be between 384 and 2048 bits.
For ssh2.0 protocol, key sizes must be between 768 and 2048 bits.
The size of the host and server keys must differ by at least 128 bits.

Please enter the size of host key for ssh1.x protocol [768] :
Please enter the size of server key for ssh1.x protocol [512] :
Please enter the size of host keys for ssh2.0 protocol [768] :

You have specified these parameters:
host key size = 768 bits
server key size = 512 bits
host key size for ssh2.0 protocol = 768 bits
Is this correct? [yes]

Setup will now generate the host keys in the background. It will take a
few minutes. After Setup is finished you can start SSH server with
command 'secureadmin enable ssh'. A syslog message will be generated
when Setup is complete.

Thu Jan 11 12:32:21 GMT [filer2: rc:info]: SSH Setup: SSH Setup is done. Host keys are stored in /etc/sshd/ssh_host_key, /etc/sshd/ssh_host_rsa_key and /etc/sshd/ssh_host_dsa_key.

(2) Start ssh on the filer.

filer2> secureadmin enable ssh

(3) Attempt a standard ssh login as root user.

gjl-powerbook:~ garylittle$ ssh -l root filer2
The authenticity of host 'filer2 (' can't be established.
RSA key fingerprint is 9b:99:37:9f:21:c1:09:1f:45:82:25:fd:5c:d8:99:a1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'filer2,' (RSA) to the list of known hosts.
root@filer2's password:

(4) Check remote execution (use version command)

gjl-powerbook:~ garylittle$ ssh -l root filer2 version
root@filer2's password:
NetApp Release 7.1: Fri Dec 23 00:48:41 PST 2005

Now we setup another user to use passwordless login.

(5) Enable passwordless login.

5.1 Create a user if required, use the non priviliged group 'Users'

filer2> useradmin user add garylittle -g Users
New password:
Retype new password:
User added.

5.2 Create a role calles 'snaps' defining what he can do. In this case manipulate snapshots only.

filer2> useradmin role add snaps -c "CLI Snapshots" -a cli-snap*

5.3 Assign that role to a group

filer2> useradmin group add cli-snapshot-group -r snaps
Group added.

5.4 Assign the user to the group

filer2> useradmin user modify garylittle -f -g cli-snapshot-group
User modified.

filer2> useradmin user list garylittle
Name: garylittle
Rid: 131072
Groups: cli-snapshot-group
Full Name:
Allowed Capabilities: cli-snap*
Password min/max age in days: 0/4294967295
Status: enabled

5.5 Generate a public/private key for the user who wants to login on the system that they want to login from. In my case this is my laptop.

gjl-powerbook:~/.ssh garylittle$ ssh-keygen -t rsa -b 1024

Generating public/private rsa key pair.
Enter file in which to save the key (/Users/garylittle/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/garylittle/.ssh/id_rsa.
Your public key has been saved in /Users/garylittle/.ssh/
The key fingerprint is:
64:2b:29:40:53:34:83:a2:b0:b4:5e:10:3f:a2:b5:5b garylittle@gjl-powerbook.local

gjl-powerbook:~/.ssh garylittle$ ssh-keygen -t dsa -b 1024

Generating public/private dsa key pair.
Enter file in which to save the key (/Users/garylittle/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/garylittle/.ssh/id_dsa.
Your public key has been saved in /Users/garylittle/.ssh/
The key fingerprint is:
b7:0a:e1:f0:51:f4:b0:e5:c0:6a:06:da:6d:0b:f7:16 garylittle@gjl-powerbook.local

5.6 Add the public keys to the filer. In my case I had mounted /vol/vol0 onto /mnt on my local machine. I had to create the directories /etc/sshd/garylittle manually.

gjl-powerbook:~/.ssh garylittle$ cat >> /mnt/etc/sshd/garylittle/.ssh/authorized_keys
gjl-powerbook:~/.ssh garylittle$ cat >> /mnt/etc/sshd/garylittle/.ssh/authorized_keys

(6) Test... executing a snap list works fine...

gjl-powerbook:~/.ssh garylittle$ ssh filer2 snap list
Volume ora9test_dbf

%/used %/total date name
---------- ---------- ------------ --------
0% ( 0%) 0% ( 0%) Jan 04 14:28 volsnap
0% ( 0%) 0% ( 0%) Dec 15 16:00 hourly.0
0% ( 0%) 0% ( 0%) Oct 20 18:10 DBsnap1
0% ( 0%) 0% ( 0%) Oct 20 18:06 adhoc
0% ( 0%) 0% ( 0%) Oct 20 16:00 hourly.1

Check the user only has the privs we defines

gjl-powerbook:~ garylittle$ ssh filer2

Connection to filer2 closed.

gjl-powerbook:~ garylittle$ ssh filer2 version
Permission denied, user garylittle does not have access to version

But we can create snapshots..

gjl-powerbook:~ garylittle$ ssh filer2 snap create vol0 ssh-snap
creating snapshot...
gjl-powerbook:~ garylittle$ ssh filer2 snap list vol0
Volume vol0

%/used %/total date name
---------- ---------- ------------ --------
0% ( 0%) 0% ( 0%) Jan 11 13:45 ssh-snap
1% ( 1%) 0% ( 0%) Dec 15 16:00 hourly.0
2% ( 1%) 1% ( 0%) Oct 20 16:00 hourly.1
2% ( 1%) 1% ( 0%) Oct 19 16:00 hourly.2
3% ( 1%) 1% ( 0%) Oct 17 16:00 hourly.3
4% ( 1%) 1% ( 0%) Sep 17 16:00 hourly.4
5% ( 1%) 2% ( 0%) Sep 15 16:00 hourly.5
gjl-powerbook:~ garylittle$

Analysing NetApp sysstat PT1: The CP columns

CP Types are broken into two fields of one character. So a CP Type of Hf states that ontap was a type 'H' - doing a consistency point due to 'highwatermark' i.e. memory needs to be flushed (rather than NVRAM is full) and was in state (or phase) 'f'; flushing modified data to disk.

CP TypesCP Phases
B - Back to back CPs (CP generated CP)0 - Initializing
b - Deferred back to back CPs (CP generated CP)n - Processing normal files
F - CP caused by full NVLogs - Processing special files
H - CP caused by high water markf - Flushing modified data to disk
L - CP caused by low water markv - Flushing modified superblock to disk
S - CP caused by snapshot operation
T - CP caused by timer
U - CP caused by flush
Z - CP caused by internal sync
: continuation of CP from previous interval
# continuation of CP from previous interval, and the NVLog for the next CP is now full, so that the next CP will be of type B.

A quicker cp (Results)

I tested my cp script (which I called pcp ' parallel cp') against a NetApp NFS server from a Solaris client. The best results vs standard 'cp' was when the FS was mounted '-o forcedirectio'. Since forcedirectio is often used by DB hosts, and DB servers typically use very large files the performance of cp can be quite important. During my tests I saw roughly 100% improvement, i.e. it took 1/2 the time to copy a large file using 'pcp' vs standard 'cp' for a file of 2Gb.

Here are some graphed results which I took from the raw iostat.

Standard Solaris cp

Vs pcp using 1MB IO's and 8 dd 'threads'.

A Quicker copy (cp)

Quite often poor write performance is due to a low degree of parallelism on the write side. This is particularly problematic for operations like copy (cp) where each write is essentially synchronous. The issue is that when a file is created or 'extended' (more blocks are allocated to a file) which is the case when doing a copy, the filesystem meta-data is altered. In the case of cp, a block is taken from the free pool and allocated to a file.

A custome of mine was copying a large file - around 300Gb if I remember correctly, that was taking several hours. The utilisation of the storage was very low, even though service times (as measured by iostat on Solaris) were also low. I used 'dd' to chop up the file into 4 pieces (using 'seek' to index into the file) and so created a parallel copy. This technique improved the copy performance by 50%.

I wanted to create a scripted version that could be run on any fle regardless of size. The script is below.


#Parallel copy.

let pcpBSIZE=$IO_IN_MB*1024*1024
echo Source is $1
echo Dest is $2
#rm $DEST

SIZEINBLOCKS=`ls -s $1|cut -d' ' -f 1`
echo Size of $SOURCE is $SIZEINBLOCKS blocks
echo Size of $SOURCE is $SIZEINBYTES
echo Size of chunk=$CHUNK

let loop=0
while ((loop<THREADS-1))
let OFFSET=$CHUNK*$loop/$pcpBSIZE
$DD if=$SOURCE of=$DEST bs=$pcpBSIZE count=$COUNT iseek=$OFFSET oseek=$OFFSET &
let loop=loop+1
#Special case, the last dd does until EOF
#let loop=loop+1
let OFFSET=$CHUNK*$loop/$pcpBSIZE
$DD if=$SOURCE of=$DEST bs=$pcpBSIZE iseek=$OFFSET oseek=$OFFSET