Tar

Table of Contents

[in package TAR]

This project provides a high level interface for interacting with tar archives. It consists of several systems that provide different levels of functionality.

NOTE: In order to load tar-extract, you need a version of osicat with commits from this PR.

Quickstart

If you want to extract a tar archive, without caring about preserving all the metadata, run the following:

(asdf:load-system :tar-simple-extract)

(tar:with-open-archive (a "/path/to/file.tar")
  (tar-simple-extract:simple-extract-archive a :directory "/path/to/extraction/point/"))

If you want to extract a tar archive, attempting to preserve symbolic links and as much metadata as possible, evaluate the following:

(asdf:load-system :tar-extract)

(tar:with-open-archive (a "/path/to/file.tar")
  (tar-extract:extract-archive a :directory "/path/to/extraction/point/"))

tar

The tar system is a thin layer on top of the tar-file system. While tar-file is focused on reading and writing physical entries, tar places an emphasis on reading and writing logical entries.

The practical effect of this is that when using tar, any single object stored in a tar archive (regular file, symlink, etc.) is always represented as a single entry (tar:file-entry, tar:symbolic-link-entry, etc.). This is in contrast to tar-file where a regular file with a long name could be either unrepresentable, a single tar-file:file-entry with a ustar header, a tar-file:pax-extended-attributes-entry followed by a tar-file:file-entry, or a tar-file:gnu-long-name-entry followed by a tar-file:file-entry. tar takes care of generating and interpreting the correct sequence of physical entries, depending on the type of the archive being used.

While this system is useful for reading and writing tar archives, its primary purpose is for the inspection of archives and creating archives whose content does not come directly from the file system. For file system integration, see the remaining systems.

tar-simple-extract

The tar-simple-extract system provides functionality to extract a tar archive to your filesystem. Unfortunately, faithfully extracting tar files onto the filesystem is a complex task and is impossible to perform using only the functionality mandated by the Common Lisp specification.

Therefore, this system does not try to faithfully reproduce the contents of the tar file. This means that it should work on any CL implementation that tar and tar-file does, but the cost is there is information loss.

This system does not support extracting the following entry types:

Additionally, metadata such as file owner, permissions, and modification time are not set.

It is recommended that this extraction method is used only to extract an archive to an empty folder. If that is done with default settings, the extraction process should be fairly safe and predictable. Otherwise, you run the risk of existing symlinks being followed and overwriting arbitrary files on your machine.

tar-extract

The tar-extract system attempts to provide full extraction functionality. As such, it is a much more complex beast and likely does not work on all implementation/OS combinations. Patches are always welcome to make it more portable.

This system does not support extracting the following entry types on Windows:

While it is possible to create symbolic links on Windows, it requires special user permissions and many Windows applications are not designed with symlinks in mind. This makes it both very unlikely that an arbitrary user can create symlinks and very likely that the creation of symlinks would pose a risk to other applications running on the same machine. Hard link support is also possible, but I've rarely seen it and need to do more research before attempting to support it.

While there is much less information loss when extracting an archive using this system, that comes at the cost of increased security concerns when given an untrusted archive as input. For example, a typical attack vector is to write a symlink that points to an arbitrary file on your system and then using that symlink to modify the target.

The goal of this system is to provide default extraction options so that it is safe to extract an untrusted archive into an empty directory and have no way for the extraction process to modify any file that exists outside of that directory. Bug reports and patches are always welcome if you find this goal is not met.

The safety goal is paramount, and as such, the performance of this system is likely not the best. A primary reason is that we want to be robust to the different file systems that are out there (e.g., case insensitive or unicode normalizing), so we (currently) do not rely on caches to determine if a path is safe to write to. The result is that there are many syscalls that happen during extraction. There are probably ways this can be improved and patches are always welcome.

1 Tar

This section describes the high level tar archive support.

1.1 Tar Archives

1.2 Tar Entries

This section describes the various entry types and how to read/write them from/to an archive.

1.3 Tar Conditions

This section describes the various conditions and restarts in the tar system.

2 Simple Extraction

[in package TAR-SIMPLE-EXTRACT]

This section describes the support for simple extraction to the filesystem.

2.1 Simple Extraction Conditions

This section describes the conditions that can occur during SIMPLE-EXTRACT-ARCHIVE.

3 Extraction

[in package TAR-EXTRACT]

This section describes the support for non-portable extraction to the filesystem.

3.1 Extraction Conditions

This section describes the conditions that can occur during EXTRACT-ARCHIVE.