The Sorter utility will probably be the most used utility in the SleuthKit Toolkit. Adaptations to “Sorter” will allow the forensic investigator to create definition files that categorizes files in an approach that is comfortable to the investigator’s work style. Sorter will read an image file and extract the various file types based upon file contents and file extensions. The files will be placed into a directory based upon their respective categories; images will be placed in an image directory, text files in a directory called text, so forth and so on. The investigator can control this behavior by specifying more detailed categories and associating only specific file extensions to that category. For example, the investigator can create a category call “jpg” and place in that directory only files that have the jpg extension. Because of the way the category matches occur, you are limited somewhat. The “windows.sort” file identifies and categories a majority of the Windows OS files.
The key to customizing Sorter configuration files is to determine the output of the file command. Grep the ~/sleuthkit/share/file/magic for a descriptive value such as “image.” This will produce numerous descriptions containing the word “image”; some references will be to file system images. The project impact of modifications to the category descriptions can be estimated and fine-tuned. Any modifications should be fully tested prior to live casework.
The following entries can be used to automatically extract, identify and categorize the following images types: JPEG, GIF, TIFF, PNG, BMP. The original version of the images.sort file identified all these image types and placed all these files in the
images category. Since Steganographic capabilities have begun to emerge as a
viable data hiding mechanism, each file type may require additional processing to discover hidden data. The decision to force each file type into a separate category
© SANS Institute 2006, Author retains full rights.
© SANS Institute 2006, Author retains full rights.
Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46
13 command.
#Category Images Cut Line
#Save this snippet as images.sort in your sleuthkit/share/sorter directory
# category cat_name keywords
# ext ext1,ext2,ext3 keywords
# Category Images
category images image data
category images graphic image
category JPG JPEG image data
ext jpg,jpeg,jpe JPEG image data
category GIF GIF image data
ext gif GIF image data
category TIF TIFF image data
ext tif TIFF image data
category PCX PCX(.*?)image data
ext pcx PCX(.*?)image date
category PNG PNG image data
ext png PNG image data
category BMP bitmap data
ext bmp PC bitmap data
#Category Images Cut Line
The Sorter process is quite simple and breaks down into essentially three functional tasks. The following order is presented for illustration purposes. First, Sorter reads the configuration files in ~/sleuthkit/src/share/sorter and determines the categories. The default.sort file is read first and then the default file system type configuration file is read. This is based upon the “-f fstype” parameter: bsdi, ext2/3, fat[12/16/32], freebsd, ntfs, openbsd, solaris, swap, and ufs1/2. If a default
configuration file exists, that file is read, such as freebsd.sort, linux.sort. openbsd.sort solaris.sort or windows.sort. Duplicate category rules are eliminated; the last read rule has precedence.
Second, the file command (SleuthKit’s file command) inspects the file and
determines the type of file by using a ruleset defined in sleuthkit/src/share/file/magic. Third, the category string is matched with the output of file. If the string is matched by the category string, the ext rule is matched. If a match is made and the ext
matches, the file is identified by that category and extension. Depending upon the options selected, a log entry or the file is copied to the appropriate directory. If the extension does not match the extension rule, an extension mismatch log entry is made in the category audit file and the extension mismatch log file.
The string rules are regular expressions, which provide a wide range of text parsing possibilities. Mostly the matches will be straightforward literal strings, as in
© SANS Institute 2006, Author retains full rights.
Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46
considered an archive file. You may have multiple category rules for a category. The archives.sort configuration rule has 13 category definitions for “archives.” Anytime the output from the file command matches one of those rules, the file is considered as belonging to the archives category and any extension rule matches are performed for that particular category match. The extension mismatches are determined by the extension rule or by turning off extension checking with command line arguments.
#Category Archives Cut Line
#Save this snippet as archives.sort in your sleuthkit/share/sorter directory
#category archives
category archives Zip archive data
ext zip,jar Zip archive data
ext wmz Zip archive data
category archives tar archive
ext tar tar archive
category archives Microsoft Cabinet
ext cab Microsoft Cabinet File
category archives compress data
ext gz,tgz gzip compressed data
ext Z compress'd data
ext bz2 bzip2 compressed data
ext bz bzip compressed data
category archives RPM
ext rpm RPM
category archives cpio archive
category archives ARC archive data
ext arc ARC archive data
category archives LHa archive data
ext lha LHa archive data
ext lzh LHa archive data
category archives shell archive text
ext shar shell archive text
category archives uuencoded or xxencoded text
ext uue uuencoded or xxencoded text
ext uu uuencoded or xxencoded text
ext bhx uuencoded or xxencoded text
ext xxe uuencoded or xxencoded text
© SANS Institute 2006, Author retains full rights.
© SANS Institute 2006, Author retains full rights.
Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46
15
ext hqx BinHex binary text
category archives StuffIT Archive
ext sit StuffIT Archive
category archives RAR Archive data
ext rar RAR Archive data
category archives ARJ Archive data
ext arj ARJ Archive data
# The below types to be implemented when source files are available for testing #B64,LZW,LBR,MSX,PAK,PIT,TAZ,_Q_,ZOO
#Category Archives Cut Line
The previously described concept is best represented by the composite.sort
configuration file. Any file type that contains other files is defined as belonging to the composite category. This works great for identifying composite files.
#Category Composite Cut Line category composite archive category composite compress category composite cabinet category composite rpm category composite filesystem
#need for file to recognize MS Backup files bkf #Category Composite Cut Line
These files can be used with Sorter to extract all archive files or all image files from a disk image file.
Command: sorter –f ntfs –d /forensics/evidence/case_jones_030106/data/sorter –C archives.sort –h –s /forensics/images/jones_hda1.dd
Command: sorter –f ntfs –d /forensics/evidence/case_jones_030106/data/sorter –C images.sort –h –s /forensics//images/jones_hda1.dd
This command only generates an audit report identifying composite files.
Command: sorter –f ntfs –d /forensics/evidence/case_jones_030106/data/sorter –C composite.sort –h /forensics//images/jones_hda1.dd
2.2.2.1.3. Building Known Good File (KGF) Repository.