Unpacking Xamarin AssemblyStore Blobs (Updated 12/10/22)

Note: This post was updated on December 10, 2022 to include additional information and clarifications.

Recently I was building a Xamarin application as a capture the flag exercise. In the past, I’ve noted that the C# DLLs for Xamarin end up in the assemblies/ directory within an APK file. This time, however, I noticed that there were only two files present in the assemblies/ directory: assemblies.blob and assemblies.manifest. A quick Google search indicated that this format is a recent change to how Xamarin packs DLLs, and the tooling (other than the official Xamarin C# utilities) hasn’t caught up. In this short post, I’ll talk about my experiences with these 2 files and link to a Python utility that can be used to unpack assemblies.blob files. If you’d rather just use the tool and move on, it can be found here.

Understanding the “assemblies.manifest”

The assemblies.manifest file is an ASCII file that lists the names, IDs, and other metadata of the Xamarin DLL files. Its structure is described here and a basic parser can be found here. As we’ll find out in the next section, the only real piece of data useful within this file is the Name field, as (at least in my testing) there is no DLL name data within the assemblies.blob file.

I was also curious about the first two fields: Hash 32 and Hash 64. My initial suspicion was maybe these are checksums or an integrity check on the respective DLL. After digging through the Xamarin C# code, it was clear that was not the case. These fields are actually far less interesting from a reverse-engineering standpoint, as they’re just the output generated by using the xxHash non-cryptographic hashing algorithm (source here) on the DLL’s name. Unless for some reason you’re changing the name of DLLs or adding DLLs to the AssemblyStore, this value can be ignored. There are plenty of xxHash bindings (python, C#, ruby) should you need to set/reset these fields.

Oddly enough this file is not checked by Xamarin/libmono when attemping to load the DLLs from an assemblies.blob. Its value is essentially to give you the raw filenames of the DLLs stored in the assemblies.blob, as only the xxHash values exist in that file. Let’s move on to the more exciting of the two files.

Understanding the “assemblies.blob”

The assemblies.blob is where the real excitement is and requires a little more analysis, as it’s a binary structure. At the top level, the parser here describes the layout, but it also references additional classes. I’ll refer to this structure as an AssemblyStore for the rest of the blog. The header for the AssemblyStore is 20 bytes and includes the magic XABA, version (which currently is 1), and the number of included assembly files.

Immediately following the header is the confusingly named AssemblyStoreAssembly class described here. For each included assembly file, a 24-byte structure will be present. These don’t have IDs but appear to be serialized in the correct order (e.g., the first structure would be index 0, then index 1, etc.). The DataOffset and DataSize are both important here, as this tells you where in the assemblies.blob file the DLL lives, and how many bytes to extract. In my testing, the remaining fields were all zeroed out so I’m not going to be talking about them.

Next, hash data for each assembly is detailed. The structure is 20 bytes and maps to the Hash 32 and Hash 64 values in the assemblies.manifest above. You’ll see a 20-byte structure for each assembly for the Hash 32, followed by the same number of 20-byte structures for the Hash 64 values. There’s really no need to dig into these sections unless you intend to add additional assemblies to the AssemblyStore or change names, but details on the structure can be found here. One important piece of information that I discovered while writing the repacking features of my tool: the hashes need to be saved from smallest to largest value. Saving them in any other order causes Xamarin to fail to load the DLLs!

The rest of data in the assemblies.blob is the actual DLL content, which, combined with the assemblies.manifest data, we can use to extract and name the associated DLLs.

Extracting from the assemblies.blob

Putting it all together, we can first parse the assemblies.manifest to build a list of assemblies by name and index, then iterate through the AssemblyStoreAssembly structures of the assemblies.blob to extract the DLL with the correct name. That’s exactly what pyxamstore does. The latest version (December 10, 2022) of the tool will also remove the LZ4 compression on the DLLs if applied. From here, you can pop the unpacked DLLs into your favorite C# reversing tool (such as dnSpy or ILSpy) to perform your analysis.

What’s Next? (Added 12/10/22)

In a previous version of this article I mentioned that it would be interesting to see if you could repackage DLLs into an assmemblies.blob file. Well, after some people reached out expressing their desire to do this I decided it would be a good exercise to try! The latest version of the tool ships with a pack command. This command attempts to regenerate a new blob + manifest file. This is still a beta feature at this time, so if you’re experiencing issues try enabling Xamarin debug logs: adb shell setprop debug.mono.log all.

And that’s it! I hope you found this write up useful. If the content has changed since writing this, don’t hesitate to reach out and correct me.