Observed structure
[4 bytes original size][1 byte version/algo][3 bytes flags][2-5 padding bits][raw DEFLATE stream]
Once the wrapper and padding are removed, Go's standard compress/flate reader can handle the actual decompression.
The modern SAP wrapper handled by this repository is not a separate magic compressor. It is an SAP header, followed by a few padding bits, in front of what becomes a standard raw DEFLATE stream once the bit alignment is corrected.
This means the hard part is not inventing a new decompressor. The hard part is understanding the wrapper, validating the header, and shifting the payload back into alignment before handing it to a standard inflater.
[4 bytes original size][1 byte version/algo][3 bytes flags][2-5 padding bits][raw DEFLATE stream]
Once the wrapper and padding are removed, Go's standard compress/flate reader can handle the actual decompression.
The implementation stays small because the real work is understanding the wrapper and bit alignment correctly. The decompression itself is standard library work.
The starting point was a file called example.blob: high-entropy binary data extracted from an SAP data BLOB, known to contain a document, but unreadable by ordinary archive and compression tools.
The file was 118,703 bytes of unrecognizable binary. It clearly came from an SAP-backed storage flow, but after extraction as a raw database BLOB, ordinary tools could not open it. The task was simple in theory and tedious in practice: determine what the wrapper was doing and recover the original file bytes without relying on proprietary tooling.
$ file example.blob
example.blob: data
$ python3 -c "
from collections import Counter
import math
data = open('example.blob','rb').read()
freq = Counter(data)
H = -sum((c/len(data))*math.log2(c/len(data)) for c in freq.values())
print(f'Entropy: {H:.2f} bits/byte')
print(f'Unique bytes: {len(freq)} / 256')
"
Entropy: 8.00 bits/byte
Unique bytes: 256 / 256
Entropy at the theoretical ceiling and all 256 byte values present meant the file was either heavily compressed or encrypted. It was not ordinary uncompressed document data.
Common formats were tested and rejected before any SAP-specific conclusion was drawn. That matters. Reverse engineering gets weaker when people jump straight to the answer they want.
| Format or transform | Result |
|---|---|
| gzip, bzip2, xz, zstd, lz4 | Not recognized |
| ZIP, 7z, RAR, TAR | Not recognized |
| Brotli, Snappy, LZO, LZMA | Failed |
| PostgreSQL pglz, BSON, msgpack | No match |
| All 256 single-byte XOR keys | No PDF signature |
| Bit rotation, nibble swap, byte reversal | Nothing useful |
After that sweep, the evidence pointed away from ordinary compression containers and toward a wrapper format or SAP-specific transform.
Offset Hex ASCII 000000 3F F7 01 00 12 1F 9D 02 25 ... ?.......%
| Offset | Value | Meaning |
|---|---|---|
| 0-3 | 3F F7 01 00 = 128,831 | Uncompressed size (little-endian) |
| 4 | 0x12 | Version 1, algorithm 2 |
| 5-6 | 1F 9D | Flags or magic-related bytes |
| 7 | 0x02 | Flags |
| 8 | 0x25 = % | First recognizable byte of %PDF |
That byte-level clue changed the posture. The payload clearly began immediately after an 8-byte wrapper, but the stream was not simply "header plus plain file." There was still alignment logic involved.
Once the file was treated as an SAP data BLOB rather than a generic archive, the public research trail became much stronger. SAP uses a family of compression routines across products such as NetWeaver, MaxDB, SAP GUI, SAPCAR, and storage-related workflows. The algorithm byte 0x12 decodes to version 1 and algorithm 2, which points to the LZH wrapper variant.
Studying the MaxDB source lineage reveals the central fact: SAP's so-called LZH wrapper is not an exotic new compression algorithm in the payload body. Once normalized, the body behaves like standard RFC 1951 DEFLATE.
// SAP extra-length-bits table:
int CsExtraLenBits[] = {0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,
3,3,3,3,4,4,4,4,5,5,5,5,0};
// RFC 1951 DEFLATE uses the same structure.
// SAP bit-length order:
unsigned char bl_order[] = {16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15};
// RFC 1951 DEFLATE uses the same ordering.
The dynamic and fixed block handling in the SAP code line up with DEFLATE block types, and the bit reader is standard LSB-first logic. That narrows the SAP-specific behavior down to two additions around the payload, not a full proprietary compressor.
Those padding bits are enough to break off-the-shelf tools until the bitstream is shifted back into alignment.
The wrapper stores compressed bytes, not necessarily a trustworthy original filename. The recovered payload may be a PDF, image, Office document, ZIP, XML, text, or arbitrary binary.
That is why the CLI and web page both use best-effort type detection from magic bytes. When the recovered payload cannot be identified confidently, the safe fallback extension is .bin.
| Source | Contribution |
|---|---|
| Martin Gallo | Public SAP compression and security research, including pysap and earlier reverse-engineering work. |
| Daniel Berlin | SAP REPOSRC decompressor notes and practical direction for wrapper handling. |
| Hans-Christian Esperer | hascar Haskell implementation and format documentation. |
| SAP AG / MaxDB | Original open-source compression code that exposes the header layout and padding-bit behavior. |
| RFC 1951 | The DEFLATE specification that explains why a standard inflater works after the wrapper is normalized. |
This repository deliberately acknowledges the public research and upstream code it builds on. Reverse engineering work becomes weaker when it is presented as if it emerged from nowhere.
At implementation level, the logic is smaller than the explanation suggests.
stream = data[8:] # skip SAP header
skip = 3 # 2 length bits + 1 padding bit in one real sample
shifted = bytearray()
for i in range(len(stream) - 1):
shifted.append((stream[i] >> skip) | ((stream[i + 1] & 0x07) << 5))
result = flate_or_zlib_raw_deflate_decode(bytes(shifted))
That is the core idea. The rest of the repository is engineering discipline: header validation, safer output handling, file-type identification, comments, tests, and a browser-friendly demo built on the same logic compiled to WebAssembly.
The practical conclusion is straightforward: once the wrapper is understood, the format is auditable, portable, and small enough to explain. That is exactly how such tooling should be published.