My task was to create a script that would recursively tar and gzip all of the files under /export/samba/fileshare/ every night. I knew it would be wasteful and time consuming to backup all 150+MB every night, so I decided to backup the entire fileshare weekly, and then daily backup only those files that had been modified in the last day. The fileshare system isn't exactly what you call 'mission-critical,' it's sole purpose is to make those 150+MB of data available to 10 users, so I wasn't planning on backing the system up to tape, I just wanted to store the tar.gz files in another directory. [The purpose of backing up, of course, being more to protect against accidental file/folder deletion by humans rather than machine failure.] Since multiple backup archives would be saved in the same directory, I wanted the filename to indicate the date. And lastly, I only wanted to store up to a month of backups, so knew I’d also need to include a purge script that would delete files in the backup directory that were more than a month old. The steps appear to be pretty clear-cut, but getting everything to work together was a challenge. Below I chart my manual investigation of each step (utility) and then slowly approach the unified and automated backup process.
$ date
Tue Sep 26 22:14:34 EDT 2000
and after having skimmed O'Reilly's Linux in a Nutshell, I knew date's output could be modified and embellished using format options (%A %b %d) much like the dynamic web-scripting work I had done previously (SSI, PHP). My challenge was to get date's customized output into a filename. I decided the following date output was explicit enough for my backup files:
$ date '+%d%b'
26Sep
In this example, the '+' (plus sign) indicates formatting options, and %d%b represent the respective elements of the date. Next, I brushed up on the BASH shell to research redirections and pipes. I thought I'd have to use "" (quotes) or > (redirection) to get date into a filename, but after some guessing and searching, I discovered that I needed to use backticks (`) for command substitution. In other words I could surround the date command in backticks as an argument in the filename. I tested it with vi:
$ vi test.`date +'%d%b'`.attempt1
"test.26Sep.attempt1" [New File]
And it worked!
tar cf - /export/samba/fileshare/ > /export/samba/fb/fbweekly.`date '+%d%b'`.tar
The last part of the line above (after the '>') incorporates the date command substitution that I learned above. The redirection symbol '>' takes the output of tar and puts it in that fancily named file. The /export/samba/fileshare/ is the root of the fileshare directory that’s being archived using "tar cf -". The 'c' tells tar to create an archive, the 'f' alerts tar to the source of the input, and the '-' (hyphen) causes tar to store the archive on standard output, which is then redirected to the file, "/export/samba/fb/fbweekly.`date '+%d%b'`.tar".
$ tar cf - /export/samba/fileshare/ > /export/samba/fb/fbweekly.`date
'+%d%b'`.tar
$ gzip -9 /export/samba/fb/fbweekly.`date '+%d%b'`.tar
[Note: To be more error-proof, I could also set the value of that day's filename "fbweekly.`date '+%d%b'`.tar" to a variable that I would then use in the tar and gzip lines. Therefore, if the date changed between the tar and the gzip line, this would prevent gzip from not finding the tar file.]
$ find /export/samba/fileshare/ -mtime -1
/export/samba/fileshare/MBrinson
/export/samba/fileshare/MBrinson/Agenda.doc
/export/samba/fileshare/CMayo
/export/samba/fileshare/CMayo/Bollenbacher Ltr.WHISEProj.doc
/export/samba/fileshare/CMayo/Battle Itinerary.doc
/export/samba/fileshare/JWatt
/export/samba/fileshare/JWatt/ScanJet 5p
/export/samba/fileshare/JWatt/ScanJet 5p/sj215en.exe
...
As you see, this listing shows both the files that were modified as well as the directories that contain them. I figured out that in order to limit the list to files only, I could use:
$ find /export/samba/fileshare/ -mtime -1 \! -type d
where the "\! – type d" means: "don't include directories." Then I combined the find command with tar, using the same command substitution syntax that I had used with date---except this time I was going to add every modified file, found by find, to the tar-ball:
$ tar cf - `find /export/samba/fileshare/ -mtime -1 \! -type d` > backup.tar
But the the line above ended up vomiting out screenfuls worth of this gorp:
tar: /export/samba/fileshare/CMayo/Bollenbacher: Cannot stat: No
such file or directory
tar: Ltr.WHISEProj.doc: Cannot stat: No such file or directory
tar: /export/samba/fileshare/CMayo/Battle: Cannot stat: No such
file or directory
tar: Itinerary.doc: Cannot stat: No such file or directory
tar: /export/samba/fileshare/JWatt/ScanJet: Cannot stat: No such
file or directory
tar: 5p/sj215en.exe: Cannot stat: No such file or directory
I quickly realized that tar was interpreting each space in the filename as the end of the filename, which rendered almost all of the files and paths unintelligible. [Realize that the files and directories were named from Windows, and thus were riddled with spaces.] I assumed that I needed find to surround each "path/file" with quotes, but I couldn't figure out how to do that, so I tried something I saw in another backup script, which separated the find and tar steps using a text file as an intermediary:
$ find /export/samba/fileshare/ -mtime -1 \! -type d > /tmp/modified.files
$ tar cT /tmp/modified.files > /export/samba/fb/fbdaily.`date '+%d%b'`.tar
This time, find redirected it's output to a file in /tmp/, and tar took this file, using the T option, and successfully created the daily archive of modified files. The lines in text file still had spaces in them, but I assume tar treated each line as a separate file, rather that each chunk of text separated by spaces. [Note: This separation of find and tar could also prove advantageous with an if statement that checks to see if the modified.files file contains anything. If it did, then tar would be invoked; if not, then the script would end, preventing tar from trying to create an empty archive.]
I then finished the process with a gzip line:
$ gzip -9 /export/samba/fb/fbdaily.`date '+%d%b'`.tar
<minute: 0-59> <hour: 0-23> <dayofmonth: 1-31> <month: 1-12> <dayofweek: 0-6> command
The cron daemon, which is always running, continually checks the crontab file for entries. If the current date and time match the settings in a crontab line, then cron executes the given command.
Seeing as though it would be easiest to have cron run a single script as the command, I encapsulated the previous steps in two different files [This would also allow me to easily add the use of a filename variable that I mentioned above.]:
# fbweekly:
#!/bin/bash
# consists of only two lines that tar the entire fileshare and
then gzip it
tar cf - /export/samba/fileshare/ > /export/samba/fb/fbweekly.`date
'+%d%b'`.tar
gzip -9 /export/samba/fb/fbweekly.`date '+%d%b'`.tar
# fbdaily:
#!/bin/bash
# consists of three lines: one to find the modified files, and
then two to tar and gzip them
find /export/samba/fileshare/ -mtime -1 \! -type d > /tmp/modified.files
tar cT /tmp/modified.files > /export/samba/fb/fbdaily.`date '+%d%b'`.tar
gzip -9 /export/samba/fb/fbdaily.`date '+%d%b'`.tar
With two self-contained scripts, I was able to put these lines into crontab:
30 23 * * 1,2,3,4,5 fbdaily
30 23 * * 0 fbweekly
The first line runs fbdaily at 11:30p every weekday of every month. The second line runs fbweekly at 11:30 every Sunday. At this point, I had accomplished everything I wanted to, but I remembered that if I didn’t automatically purge the backup files, the hard drive would eventually fill up. So I created a script to find files in the backup directory that are more than a month old, and remove them:
# fbclean:
#!/bin/bash
# finds all backup files more than 31 days old and deletes them
find /export/samba/fb/ -mtime +31 \! -type d -exec rm -f {} \;
I was able to accomplish the cleanup process with only one line, which I could have put directly in the crontab file, but I decided to make it a script like the others which I could add to later or use independently of crontab if I so needed. I also discovered the -exec option of find:
-exec rm -f {} \;
The –exec option tells find to execute the following command, "rm –f", for every file that's found. The brackets {} symbolize the place in the rm command where the file to be remove would normally appear, and the escaped semicolon \; marks the end of the –exec argument. With this new script, I modified the crontab file to look like the following, which completed my goal of a completely automated backup process:
30 23 * * 1,2,3,4,5 fbclean; fbdaily
30 23 * * 0 fbweekly
Now fbclean checks for and deletes old backup files every weekday, and then fbdaily creates the archive of modified files for that day.
INLS 183 Project 3: Backup with tar, gzip, find, date and cron script file