A custome of mine was copying a large file - around 300Gb if I remember correctly, that was taking several hours. The utilisation of the storage was very low, even though service times (as measured by iostat on Solaris) were also low. I used 'dd' to chop up the file into 4 pieces (using 'seek' to index into the file) and so created a parallel copy. This technique improved the copy performance by 50%.
I wanted to create a scripted version that could be run on any fle regardless of size. The script is below.
#!/bin/bash
#Parallel copy.
BLOCKSIZE=512
IO_IN_MB=1
let pcpBSIZE=$IO_IN_MB*1024*1024
let THREADS=4
DD=/bin/dd
SOURCE=$1
DEST=$2
echo Source is $1
echo Dest is $2
#rm $DEST
SIZEINBLOCKS=`ls -s $1|cut -d' ' -f 1`
echo Size of $SOURCE is $SIZEINBLOCKS blocks
let SIZEINBYTES=$SIZEINBLOCKS*BLOCKSIZE
let CHUNK=SIZEINBYTES/$THREADS
echo Size of $SOURCE is $SIZEINBYTES
echo Size of chunk=$CHUNK
let COUNT=$CHUNK/$pcpBSIZE
let loop=0
while ((loop<THREADS-1))
do
let OFFSET=$CHUNK*$loop/$pcpBSIZE
$DD if=$SOURCE of=$DEST bs=$pcpBSIZE count=$COUNT iseek=$OFFSET oseek=$OFFSET &
let loop=loop+1
done
#Special case, the last dd does until EOF
#let loop=loop+1
let OFFSET=$CHUNK*$loop/$pcpBSIZE
$DD if=$SOURCE of=$DEST bs=$pcpBSIZE iseek=$OFFSET oseek=$OFFSET
wait
No comments:
Post a Comment