libmspack is a portable library for some loosely related Microsoft compression formats
Formats supported
File format | Year | Algorithm | Supported? |
---|---|---|---|
COMPRESS.EXE (SZDD) | 1990 | LZSS | Decompression |
Microsoft Help (.HLP) | 1990 | LZSS | To-do |
COMPRESS.EXE (KWAJ) | 1993 | LZSS, Huffman, DEFLATE | Decompression |
Microsoft Cabinet (.CAB) | 1995 | DEFLATE, Quantum, LZX | Decompression |
HTML Help (.CHM) | 1997 | LZX | Decompression |
Microsoft eBook (.LIT) | 2000 | LZX, SHA, DES | To-do |
Windows Imaging Format (.WIM) | 2007 | LZX, XPRESS | To-do |
Exchange Offline Address Book (.LZX) | 2008 | LZX DELTA | Decompression |
Design
- Robust: There must be no uncontrolled error paths. The current "alpha" status of the library is due to feature incompleteness, not lack of robustness. The decompressors have been tested on thousands of real life CAB and CHM files and many hand-crafted test cases designed to break them
- Complete system abstraction: All file I/O and memory management is done through the mspack_system interface. A default implementation using the standard C library is provided. Not only does this make libmspack portable to any operating environment with only minimal effort, it also allows all kinds of unexpected behaviour; unpacking data from ROM, a network connection or UNIX pipe, mmap()ing files for cache performance
- Minimal memory usage: Large data blocks and streams are read and written through fixed-size I/O buffers. Buffer sizes are user-adjustable where possible
- No dependencies on other libraries: The code stands on its own. zlib is not required. Even the standard C library can be avoided. This can be very useful for embedded systems
- No endian or structure alignment problems: all data structures are read and written as byte arrays. They work correctly on every architecture and in every compiler
- Minimalist, yet complete: All special features of a particular file format are to be supported, but no more than that. Data is extracted from archives "as-is", stored in plain and simple data structures. The user must supply their own list hashing, metadata writing or filename conversion routines
License
libmspack is free software licensed with the GNU LGPL, version 2. This allows the library to be linked into any software, free or proprietary. If you would like to use this library under a different license, please get in touch.
Download libmspack
The latest release of libmspack is libmspack 0.11alpha, released on 5 February 2023. In-development code can be obtained from the libmspack Git repository.
Using libmspack
The downloadable release of libmspack can be built in a UNIX or UNIX-like environment with ./configure && make
. If obtaining libmspack from the Git repository, use the rebuild.sh script instead (this requires at least autoconf 2.57, automake 1.7 and libtool).
libmspack can also be built on Microsoft Windows with the winbuild.sh script.
Read the library API documentation
Here is a simple example of usage, which will create a CAB decompressor, use it to read the file example.cab, and list the names of all the files contained in the archive:
#include <stdio.h> #include <unistd.h> #include <mspack.h> int main() { struct mscab_decompressor *cabd; struct mscabd_cabinet *cab; struct mscabd_file *file; int test; MSPACK_SYS_SELFTEST(test); if (test != MSPACK_ERR_OK) exit(0); if ((cabd = mspack_create_cab_decompressor(NULL))) { if ((cab = cabd->open(cabd, "example.cab"))) { for (file = cab->files; file; file = file->next) { printf("%s\n", file->filename); } cabd->close(cabd, cab); } mspack_destroy_cab_decompressor(cabd); } return 0; }
Security vulnerabilities in libmspack
This is a list of security vulnerabilities reported in libmspack, and the version(s) of libmspack they affect. You should upgrade to the latest version where possible. If you discover a security vulnerability in libmspack, please contact me immediately.
Vulnerability | Affected |
---|---|
CVE-2019-1010305: CHM files with short filenames beginning "::" could cause an overread past their newly-allocated name buffers | < 0.9.1α |
CVE-2018-18586: chmextract makes no attempt to protect you from relative/absolute paths in CHM filenames CVE-2018-18585: CHM files with blank filenames (by having embedded nulls) are allowed, which trips up clients that expect non-blank filenames CVE-2018-18584: A CAB file with a Quantum-compressed block of exactly 38912 bytes will write one byte beyond the end of the input buffer |
< 0.8α |
CVE-2018-14682: A CHM file with codepoint U+0100 in a filename causes a one-byte overread when calling fast_find() on systems with no towlower() CVE-2018-14681: A KWAJ file with bad headers can write up to 2 bytes beyond space allocated for the filename CVE-2018-14680: CHM files with blank filenames are allowed, which trips up clients that expect non-blank filenames CVE-2018-14679: A CHM file referencing a PMGL/PMGI chunk exactly equal to the number of chunks causes reading a pointer from uninitialised memory and dereferencing it, usually causing a crash |
< 0.7α |
CVE-2017-11423: Custom mspack_system implementations returning a read() error while reading a CAB string makes libmspack read past the end of a stack-based buffer CVE-2017-6419: A CHM file with a negative SpanInfo can write past the end of the LZX window |
< 0.6α |
CVE-2015-4471: A CAB file with LZX-compressed data ending early during an odd-sized uncompressed block can cause a 1 byte under-read, but no crash CVE-2015-4470: A CAB file with MSZIP-compressed data and a distance code of 30 causes a 1 byte over-read, but no crash CVE-2015-4469: A CHM file with badly-encoded filename lengths or offsets causes over-read and segfaults on 32-bit architectures CVE-2015-4468, CVE-2015-4472: A CHM file with badly-encoded name lengths in PGML/PGMI blocks causes over-read and segfaults on 32-bit architectures CVE-2015-4467: A CHM file with reset interval of zero causes division by zero. CVE-2014-9732: A CAB file with two folders, the second folder invalid, and a file decompression order of folder 1, 2, 1, causes execution to jump to NULL CVE-2014-9556: On 32-bit architectures, a CAB file with invalid file offset or length (where offset + length == 2^32) causes an infinite loop in the Quantum decoder |
< 0.5α |
CVE-2010-2800: A CAB file that ends during an MS-ZIP uncompressed block causes an infinite loop in the MS-ZIP decoder | < 0.3α |
Contribute!
Please send any code changes, patches, bug reports, or other submissions to my email address.