Detecting defective tarballs (from a list of files)

There’s this build system, where tar files are downloaded into a specific folder, and then extracted, compiled and whatever. There are several thread, each one downloads, extracts, compiles and installs, and suddenly, there is no more space left on the hard drive. After a little cleanup for extra space, trying to rebuild fails, because some threads were in the middle of downloading when stopped violently, so a few tar files were corrupted – We need to delete them before going on.

Now, there are tens of files, checking them one by one will take a long time, and will be very tedious, better clean and rebuild, even if it takes a day ­čÖé

However, we can use a bash script for detecting defected tarballs!

First, how can we tell if a tar file is defected?

Tar’s “-t” option gives us a list of the stored files. It’s pretty quick, and when the file is corrupted, it fails, like so:

$ tar -tf not_a_tar_file.tar.gz 
tar: This does not look like a tar archive

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

$ echo $?
2

The bash variable “$?” gives us the result of the last execution. “Pass” is (obviously) zero, and any other value is “Fail”. So we can use this information.

Improving this for a single file, we’ll send the output, including stderr, to /dev/null, using a┬áredirection: “&> /dev/null”

Then, all we have to do is to put it into a loop, add some command line input, and we get this short script:

#!/bin/bash

if [ -z "$1" ] ; then
    DIR=`pwd`
else
    ls $1 &> /dev/null
    if [ "$?" -ne "0" ] ; then
        echo "No such path: "$1
        exit 1
    fi

    DIR=$1
fi

FILES_LIST=`ls $DIR`
for FILE in $FILES_LIST ; do
    FULLNAME=$DIR/$FILE 
    tar -tf $FULLNAME &> /dev/null
    if [ "$?" -eq "0" ] ; then
        echo $FULLNAME" is OK" 
    else
        echo "[!]" $FULLNAME "is defiective"
    fi
done

And its execution looks like this:

$ ./verify_my_tarballs.sh ./list

./list/backup.tar.gz is OK
[!] ./list/not_a_tar_file.tar.gz is defiective
./list/one_tar.tar.gz is OK
./list/onther_one.tar.gz is OK
./list/project1.tar.gz is OK

Notice that this script is good only for tar files. We can probably expand it to handle zip files and other archive file types, by using the same concept

* No tar files were harmed during the making of this post.

Amnon.