Poking around in Thimbleweed Park


After more than two years of waiting, Thimbleweed Park was finally released on March 30, 2017. If you don’t have it already, go and buy it on Steam, GOG, XBox, or the App Store.

Disclaimer: Poking around in game files might dig up some spoilers. Also, I am aware that dissecting a masterpiece like Thimbleweed Park might raise a lawyer’s eyebrow and may feel like a lack of respect towards the developers of the game. I am not trying to destroy the magic in any way by peeking behind the stage. To me, the point is to understand how the game works and maybe learn a thing or two along the way.


Inspired by a blog post by Ron Gilbert, I decided to poke around in Thimbleweed Park a bit to see what’s in there.

Disclaimer: I found some stuff, but don’t know yet how to interpret the pack files. Please contact me at n.harold.cham@gmail.com if you feel like contributing or to let me know how you’ve managed to proceed.

As a backer, I received a GOG voucher to redeem a copy of Thimbleweed Park for Linux. The program files are located in ~/GOG Games/Thimbleweed Park/game. There’s the executable ThimbleweedPark:

$ file ThimbleweedPark
ThimbleweedPark: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.24, BuildID[sha1]=cdc22481181c49a24117bfa9e1c4ec4101fb7b30, stripped

There’s are also the game’s two data files, but we can’t find out much about them:

$ file ThimbleweedPark.ggpack*
ThimbleweedPark.ggpack1: data
ThimbleweedPark.ggpack2: data

Sadly, a hex dump doesn’t give any clues at first sight:

$ hd ThimbleweedPark.ggpack1 | head
00000000  d5 3b ed 19 a9 dc 04 00  f7 ad 43 33 49 9f 17 da  |.;........C3I...|
00000010  dc 8f d6 2a 5a 5e 86 db  29 ce 68 66 40 1d b4 27  |...*Z^..).hf@..'|
00000020  14 bd 3e 56 1e bd c0 2a  4d f2 2a 05 18 de 2e 87  |..>V...*M.*.....|
00000030  20 6c 45 35 c4 1b 9c 34  39 38 b4 8c 22 cb b2 1a  | lE5...498.."...|
00000040  0c 45 57 67 7f ab 4e 99  c7 6b d4 90 68 28 c6 f3  |.EWg..N..k..h(..|
00000050  04 f4 0d 6c 3c 92 81 02  9a 92 6a 4b d3 e9 f6 4d  |...l<.....jK...M|
00000060  62 12 0c 02 50 f0 6b 73  b4 d0 46 eb 47 77 0f 48  |b...P.ks..F.Gw.H|
00000070  62 e4 08 1e 0f 4f 57 ca  82 62 2e 3a 42 b0 3e f9  |b....OW..b.:B.>.|
00000080  c9 87 ac 7b 37 41 90 f7  4e e5 bd 53 e6 14 40 21  |...{7A..N..S..@!|
00000090  2c c5 ad 7a 86 03 4d aa  19 a6 97 65 6d 6b ba 22  |,..z..M....emk."|
$ hd ThimbleweedPark.ggpack2 | head
00000000  e2 b6 f9 17 17 14 1e 00  ba 2f 8f 4d 58 de 93 ba  |........./.MX...|
00000010  5f 08 a3 ae 79 e6 0f 24  01 4a ad 1c 69 0d cc d9  |_...y..$.J..i...|
00000020  92 93 19 8a 7c b5 bf 9a  b7 ec 98 09 5c d8 d5 3d  |....|.......\..=|
00000030  9c a7 0c 41 d6 49 0c 3b  64 19 ff 0e 7b df 52 9b  |...A.I.;d...{.R.|
00000040  e6 90 14 3e ae 82 e7 f0  bf cd 0a 9b 0e 8a c7 6e  |...>...........n|
00000050  e7 ac 4d f9 2f b0 b5 82  fb 7f dc 01 7b df 6d 5b  |..M./.......{.m[|
00000060  a1 29 5d af a7 e7 7d 55  e5 28 50 50 06 f4 16 8d  |.)]...}U.(PP....|
00000070  0a 74 ac ec fb 64 21 5a  14 50 d1 95 d7 5d 62 9e  |.t...d!Z.P...]b.|
00000080  75 33 48 35 c3 3c 59 4e  1e ec 2b ba ca 20 0e 48  |u3H5.<YN..+.. .H|
00000090  c9 fb 22 12 89 77 84 d0  8a cf 06 85 c4 8e f2 0b  |.."..w..........|

Also, none of the files start with a known zip, gzip, or deflate header. And of these choices, probably only ZIP would make sense anyway, because it’s a random access file format.

What would Delores do?

To proceed, we should ask the obvious question: “What would Delores do?”

I’m not sure how much MMUCUSFLEM and Terrible Toybox still have in common in terms of storing game data, but as far as I remember, game data used to be protected by unbreakable XOR encryption™. So let’s try XORing the data with all possible bytes! Well, except 0 because we with the little knowledge we have, we do know that XORing with 0 doesn’t do a thing, right?

Here’s a little Ruby snippet which reads the first 64 bytes and XORs the data with all bytes from 0x01 to 0xff:

#!/usr/bin/env ruby

require 'open3'

# read 64 bytes from first pack file
buffer = File::binread('ThimbleweedPark.ggpack1', 64).bytes

(1..0xff).each do |code|
    puts '-' * 78
    puts "XOR with #{code}"
    # xor the data with code
    decoded = buffer.map { |x| x ^ code }
    # print a hexdump of the 'decoded' data
    Open3::popen2('hd') do |stdin, stdout|
        stdin.write decoded.map { |x| x.chr }.join('')
        stdin.close
        puts stdout.read
    end
end

Sadly, this produces nothing of interest:

$ ./xortest.rb 
------------------------------------------------------------------------------
XOR with 1
00000000  d4 3a ec 18 a8 dd 05 01  f6 ac 42 32 48 9e 16 db  |.:........B2H...|
00000010  dd 8e d7 2b 5b 5f 87 da  28 cf 69 67 41 1c b5 26  |...+[_..(.igA..&|
00000020  15 bc 3f 57 1f bc c1 2b  4c f3 2b 04 19 df 2f 86  |..?W...+L.+.../.|
00000030  21 6d 44 34 c5 1a 9d 35  38 39 b5 8d 23 ca b3 1b  |!mD4...589..#...|
00000040
------------------------------------------------------------------------------
XOR with 2
00000000  d7 39 ef 1b ab de 06 02  f5 af 41 31 4b 9d 15 d8  |.9........A1K...|
00000010  de 8d d4 28 58 5c 84 d9  2b cc 6a 64 42 1f b6 25  |...(X\..+.jdB..%|
00000020  16 bf 3c 54 1c bf c2 28  4f f0 28 07 1a dc 2c 85  |..<T...(O.(...,.|
00000030  22 6e 47 37 c6 19 9e 36  3b 3a b6 8e 20 c9 b0 18  |"nG7...6;:.. ...|
00000040
------------------------------------------------------------------------------
XOR with 3
00000000  d6 38 ee 1a aa df 07 03  f4 ae 40 30 4a 9c 14 d9  |.8........@0J...|
00000010  df 8c d5 29 59 5d 85 d8  2a cd 6b 65 43 1e b7 24  |...)Y]..*.keC..$|
00000020  17 be 3d 55 1d be c3 29  4e f1 29 06 1b dd 2d 84  |..=U...)N.)...-.|
00000030  23 6f 46 36 c7 18 9f 37  3a 3b b7 8f 21 c8 b1 19  |#oF6...7:;..!...|
00000040
------------------------------------------------------------------------------
XOR with 4
00000000  d1 3f e9 1d ad d8 00 04  f3 a9 47 37 4d 9b 13 de  |.?........G7M...|
00000010  d8 8b d2 2e 5e 5a 82 df  2d ca 6c 62 44 19 b0 23  |....^Z..-.lbD..#|
00000020  10 b9 3a 52 1a b9 c4 2e  49 f6 2e 01 1c da 2a 83  |..:R....I.....*.|
00000030  24 68 41 31 c0 1f 98 30  3d 3c b0 88 26 cf b6 1e  |$hA1...0=<..&...|
00000040
------------------------------------------------------------------------------

I’ll spare you the details, it’s unintelligible bytes all the way down to 0xff. However, we just looked at the first 64 bytes and we might miss the sweet spot just because of this. What if we have some kind of header at the beginning and it’s readable things after XOR somewhere further down? Feel free to explore, I don’t know where to look and I’m too lazy to build 256 different versions of the pack file and see whether any of them contains any meaningful strings.

TODO: Build 256 different versions of the pack files and see whether any of them contains any meaningful strings.

Watching the executable

Let’s just sit back and watch the application do its thing. Reading a file from disk requires making some system calls to open, read, and close the file and luckily, we can watch these system calls using a tool called strace:

$ strace ./ThimbleweedPark 2> strace-log.txt

Just by launching the game this way and quitting immediately, we get a 3.2 MiB log file detailling all system calls made.

Searching for ThimbleweedPark.ggpack1 reveals an interesting pattern after the first few calls:

stat("GameSheet@1x.png", 0x7ffe40e92500) = -1 ENOENT (No such file or directory)
open("ThimbleweedPark.ggpack1", O_RDONLY) = 12
fstat(12, {st_mode=S_IFREG|0755, st_size=435296382, ...}) = 0
lseek(12, 14819328, SEEK_SET)           = 14819328
read(12, "\34\253|\245\241\232\246\361\332A\35\273\302Y"..., 3070) = 3070
read(12, "\366T\335\v\23\235\312\351\f[\360\360n\271x"..., 40960) = 40960
read(12, "\277\307\277Q\0s\347\31\377X~\234\30~\1-\256"..., 4096) = 4096
close(12)                               = 0

The game looks for a file called GameSheet@1x.png, which it can’t find (as indicated by a return value of -1 ENOENT: no such file or directory). It then proceeds to open the pack file, jump to a specific location (14819328) and read a couple of bytes. I guess this means that:

  1. a PNG file is located at the specified position in the pack file
  2. files can be hotloaded by placing them in the same folder (that is, the pack file acts as a default fallback if the file is not present)

By looking at a few calls, the seek positions seem to be divisible by 4096, and data is always being read in three consecutive blocks, with the first one of varying size, the second one of a size multiple of 4096, and the last one always being 4096 bytes.

TODO: Analyze read offsets, read counts and block sizes.
TODO: Try XOR combinations with read data and search for a PNG header. Hint: I tried this with the beginning of the first and second read block, with no results, but there may still be matches further down.

A PNG header looks like this:

89 50 4e 47 0d 0a 1a 0a

The bytes returned by the read calls above look nothing like this which means that there’s at least some level of obfuscation happening (and my feeling is that it’s not just XOR with a constant number).

Parsing the probed files from the strace log gives us a list of files which are attempted to be loaded:

BankMgrSheet@1x.png
BankMgrSheet_norm.png
BatDorkSheet@1x.png
BatDorkSheet_norm.png
BorisSheet@1x.png
BorisSheet_norm.png
BrettSheet@1x.png
BrettSheet_norm.png
CaretakerSheet@1x.png
CaretakerSheet_norm.png
CarneyJoeSheet@1x.png
CarneyJoeSheet_norm.png
CassieSheet@1x.png
CassieSheet_norm.png
ChetSheet@1x.png
ChetSheet_norm.png
ChuckSheet@1x.png
ChuckSheet_norm.png
ChuckieSheet@1x.png
ChuckieSheet_norm.png
CoronerSheet@1x.png
CoronerSheet_norm.png
DaveSheet@1x.png
DaveSheet_norm.png
DavidSheet@1x.png
DavidSheet_norm.png
DeloresSheet@1x.png
DeloresSheet_norm.png
DragonSheet@1x.png
DragonSheet_norm.png
FranklinAliveSheet@1x.png
FranklinAliveSheet_norm.png
FranklinGhostSheet@1x.png
FranklinGhostSheet_norm.png
GameSheet@1x.png
GameSheet_norm.png
GarySheet@1x.png
GarySheet_norm.png
GhostFemale1Sheet@1x.png
GhostFemale1Sheet_norm.png
GhostHeadSheet@1x.png
GhostHeadSheet_norm.png
GhostMale1Sheet@1x.png
GhostMale1Sheet_norm.png
HotelKidSheet@1x.png
HotelKidSheet_norm.png
HotelManagerSheet@1x.png
HotelManagerSheet_norm.png
HotelRoomGuestSheet@1x.png
HotelRoomGuestSheet_norm.png
IndySpockSheet@1x.png
IndySpockSheet_norm.png
InventoryItems@1x.png
InventoryItems_norm.png
KenJonesSheet@1x.png
KenJonesSheet_norm.png
KenThienSheet@1x.png
KenThienSheet_norm.png
LawyerSheet@1x.png
LawyerSheet_norm.png
LenoreSheet@1x.png
LenoreSheet_norm.png
LeonardSheet@1x.png
LeonardSheet_norm.png
LookALike1Sheet@1x.png
LookALike1Sheet_norm.png
LookALike2Sheet@1x.png
LookALike2Sheet_norm.png
MadameMorenaSheet@1x.png
MadameMorenaSheet_norm.png
MimeSheet@1x.png
MimeSheet_norm.png
NatalieSheet@1x.png
NatalieSheet_norm.png
NurseEdnaSheet@1x.png
NurseEdnaSheet_norm.png
PeterSheet@1x.png
PeterSheet_norm.png
PigeonSisterShortSheet@1x.png
PigeonSisterShortSheet_norm.png
PigeonSisterTallSheet@1x.png
PigeonSisterTallSheet_norm.png
PostalWorkerSheet@1x.png
PostalWorkerSheet_norm.png
RansomeSheet@1x.png
RansomeSheet_norm.png
RatSheet@1x.png
RatSheet_norm.png
RaySheet@1x.png
RaySheet_norm.png
ReyesSheet@1x.png
ReyesSheet_norm.png
RickiSheet@1x.png
RickiSheet_norm.png
RonSheet@1x.png
RonSheet_norm.png
SandySheet@1x.png
SandySheet_norm.png
SaveLoadSheet@1x.png
SaveLoadSheet_norm.png
SheriffSheet@1x.png
SheriffSheet_norm.png
ShipmanSheet@1x.png
ShipmanSheet_norm.png
StartScreenSheet@1x.png
StartScreenSheet_norm.png
UIFontLarge@1x.png
UIFontLarge_norm.png
UIFontMedium@1x.png
UIFontMedium_norm.png
VerbSheet@1x.png
VerbSheet_norm.png
VistaSheet@1x.png
VistaSheet_norm.png
WillieSittingSheet@1x.png
WillieSittingSheet_norm.png

I guess @1x means that this is most brilliant pixel art at original resolution, and _norm might have to do with normal vectors, since there’s pretty advanced lighting in the game (and strings ./ThimbleweedPark reveals a couple of GLSL shaders).

Watching the executable even more

I have no idea how to proceed from here, we know a couple of filenames now and we know the positions the corresponding data is read from the pack files, but the data at these locations does not look like a valid PNG file.

I tried running the game from gdb and catching the lseek syscall via catch syscall lseek. I looked at the stack and I looked at the code, I used objdump to convert the executable to assembly code, but I’m not smart enough to figure out how data is decoded after being read from the pack file.

:-(

But there’s one more thing I can do: Read the memory of the running process! The following script does just that.

Running this script repeatedly with one empty PNG file named like the real thing (see the list above) yields 55 PNG files with a total of 2.9 MiB. Unfortunately, we don’t know the filenames, so the files are named by their SHA1 hash (and yes I’ve heard it’s broken).

#!/usr/bin/env ruby
require 'digest/sha1'

# This script dumps Thimbleweed Park's heap and looks for PNG streams in it.
# Hint: Poke some sticks into the machinery by placing empty files with known
# file names in the game's directory to crash the game at different points and
# get different PNG files.
# Hint: Call with PID argument.

pid = ARGV.first.to_i

# find heap address range of process and dump it to heap.raw
File::open("/proc/#{pid}/maps") do |f|
    f.each_line do |line|
        if line =~ /\[heap\]/
            line = line.split(' ').first.split('-')
            start = line[0]
            stop = line[1]
            system("gdb --batch --pid #{pid} \
                   -ex \"dump memory heap.raw 0x#{start} 0x#{stop}\"")
        end
    end
end

# look for PNG header candidates in heap.raw
hooks = `grep -obUaP "\x50\x4e\x47" heap.raw`
hooks.each_line do |line|
    pos = line.split(':').first.to_i - 1
    header = File::binread('heap.raw', 8, pos)
    if header.unpack("C*").map { |x| sprintf('%02x', x) }.join('') 
        == '89504e470d0a1a0a'
        # it's a PNG header, now determine the PNG file's length
        puts "Found a PNG file at #{pos}..."
        offset = 8
        while true do
            chunk = File::binread('heap.raw', 4, pos + offset)
            size = chunk.unpack('N').first
            if size == 0
                break
            end
            offset += 4 + 4 + size + 4
        end
        offset += 4
        data = File::binread('heap.raw', offset, pos)
        # the trailing PNG data is mangled for unknown reasons, 
        # write it manually
        data << "IEND"
        data << 0xae.chr
        data << 0x42.chr
        data << 0x60.chr
        data << 0x82.chr
        sha1 = Digest::SHA1.hexdigest(data)
        filename = "tp-#{sha1}.png"
        unless File::exists?(filename)
            puts "Writing file: #{filename}"
            File::binwrite(filename, data)
        end
    end
end

I’m not going to upload all of these files because I don’t fancy being prosecuted, but I’ll show you three images to show some results (again, go buy the game, it’s worth every dime).

Before continuing, let me make it clear that I’m not including the actual PNG files with the aim of copyright infringement, but to raise some interest among readers to explore this game further. Wouldn’t it be nice to have a tool to extract all game data from your legally obtained copy?

First, here’s an image which obviously represents the verb sheet, or as we know it, @VerbSheet@1x.png@:

Having this image and the strace log gives us both the encoded and decoded data, so there should be a way to figure it out, right?

stat("VerbSheet@1x.png", 0x7ffe40e92410) = -1 ENOENT (No such file or directory)
open("ThimbleweedPark.ggpack1", O_RDONLY) = 12
fstat(12, {st_mode=S_IFREG|0755, st_size=435296382, ...}) = 0
lseek(12, 77860864, SEEK_SET)           = 77860864
read(12, "\357\20?\256i\304[%\2065R\34\221\231H\216\311A"..., 645) = 645
read(12, "\243\1\210^F\310\237\274Y\16\245\245;\354-H'U"..., 49152) = 49152
read(12, "k\231^\327'\370\244\215h?\224\231N\321\224Yr"..., 4096) = 4096
close(12)                               = 0
brk(0x79f6000)                          = 0x79f6000
ioctl(4, DRM_IOCTL_I915_GEM_CREATE, 0x7ffe40e92100) = 0
ioctl(4, DRM_IOCTL_I915_GEM_SET_TILING, 0x7ffe40e92070) = 0
ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x64, 0x5e, 0x28), 0x7ffe40e92130) = 0
ioctl(4, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffe40e92130) = 0
ioctl(4, DRM_IOCTL_I915_GEM_SW_FINISH, 0x7ffe40e921c0) = 0
clock_gettime(CLOCK_MONOTONIC_RAW, {57272, 778052358}) = 0
stat("VerbSheet_norm.png", 0x7ffe40e92490) = -1 ENOENT (No such file or directory)
open("ThimbleweedPark.ggpack1", O_RDONLY) = 12
fstat(12, {st_mode=S_IFREG|0755, st_size=435296382, ...}) = 0
lseek(12, 77819904, SEEK_SET)           = 77819904
read(12, "\34\230_ \227\34T'k\\v\354\324CqK\340\253e"..., 1884) = 1884
read(12, "\35\315l\217\373\22:`\247\312A7\352\177\330"..., 36864) = 36864
read(12, "\20\213!\335\255M\"1\364\327\16v\304w\370"..., 4096) = 4096
close(12)  

Here’s a small sprite sheet of Willie:

And finally, we get to see how the vista is put together (and well, as stated previously, here be spoilers):

Isn’t this beautiful? I love to see the individual parts which come together to produce the magic of the game.

So let me conclude this with a few words. I’d love to be able to extract all images, sounds, and dialog lines, plus the voicemail messages with their text and the library book texts. If you feel like exploring this any further, don’t hesitate to contact me, I’ll be happy to update this site or link to your writeup. Thank you for listening.

PS. In case of emergency, don’t sue N. Harold Cham.
PPS. Thank you, Mr. Gilbert, and the whole team.