Unpacking Xamarin AssemblyStore Blobs

Recently I was building a Xamarin application as a capture the flag exercise. In the past, I’ve noted that the C# DLLs for Xamarin end up the assemblies/ directory within an APK file. This time, however, I noticed that there were only two files present in the assemblies/ directory: assemblies.blob and assemblies.manifest. A quick Google search indicated that this format is a recent change to how Xamarin packs DLLs, and the tooling (other than the official Xamarin C# utilities) hasn’t caught up. In this short post, I’ll talk about my experiences with these 2 files and link to a Python utility that can be used to unpack assemblies.blob files. If you’d rather just use the tool and move on, it can be found here.

Understanding the “assemblies.manifest”

The assemblies.manifest file is an ASCII file that lists the names, IDs, and other metadata of the Xamarin DLL files. Its structure is described here and a basic parser can be found here. As we’ll find out in the next section, the only real piece of data useful within this file is the Name field, as (at least in my testing) there is no DLL name data within the assemblies.blob file.

I was also curious about the first two fields: Hash 32 and Hash 64. My initial suspicion was maybe these are checksums or an integrity check on the respective DLL. After digging through the Xamarin C# code, it was clear that was not the case. These fields are actually far less interesting from a reverse-engineering standpoint, as they’re just the output generated by using the xxHash non-cryptographic hashing algorithm (source here) on the DLL’s name. Unless for some reason you’re changing the name of DLLs or adding DLLs to the AssemblyStore, this value can be ignored. There are plenty of xxHash bindings (python, C#, ruby) should you need to set/reset these fields.

Let’s move on to the more exciting of the two files.

Understanding the “assemblies.blob”

The assemblies.blob is where the real excitement is and requires a little more analysis, as it’s a binary structure. At the top level, the parser here describes the layout, but it also references additional classes. I’ll refer to this structure as an AssemblyStore for the rest of the blog. The header for the AssemblyStore is 20 bytes and includes the magic XABA, version (which currently is 1), and number of included assembly files.

Immediately following the header is the confusingly named AssemblyStoreAssembly class described here. For each included assembly file, a 24-byte structure will be present. These don’t have IDs but appear to be serialized in the correct order (e.g., the first structure would be index 0, then index 1, etc.). The DataOffset and DataSize are both important here, as this tells you where in the assemblies.blob file the DLL lives, and how many bytes to extract. In my testing, the remaining fields were all zeroed out so I’m not going to be talking about them.

Next, hash data for each assembly is detailed. The structure is 20 bytes and maps to the Hash 32 and Hash 64 values in the assemblies.manifest above. You’ll see a 20-byte structure for each assembly for the Hash 32, followed by the same number of 20-byte structures for the Hash 64 values. There’s really no need to dig into these sections unless you intend to add additional assemblies to the AssemblyStore or change names, but details on the structure can be found here.

The rest of data in the assemblies.blob is the actual DLL content, which, combined with the assemblies.manifest data, we can use to extract and name the associated DLLs.

Extracting from the assemblies.blob

Putting it all together, we can first parse the assemblies.manifest to build a list of assemblies by name and index, then iterate through the AssemblyStoreAssembly structures of the assemblies.blob to extract the DLL with the correct name. That’s exactly what unpack-store.py does. Once extracted, you may also need to decompress these DLLs, as it appears Xamarin is using LZ4 compression to pack the DLLs. That’s no worry, because there is already an existing tool to accomplish this. From here, you can pop the unpacked DLLs into your favorite C# reversing tool (such as dnSpy or ILSpy) to perform your analysis).

Future Ideas

An exercise that I’ve not performed yet but could see value in doing is modifying content in one of the unpacked DLLs. Of course, you could use a dynamic instrumentation framework like Frida to perform this, but in the cases where that doesn’t work, it seems very possible to reverse the steps described above: Modify an unpacked DLL using dnSpy, pack with LZ4, re-package the AssemblyStore object, serialize to a new assemblies.blob, rebuild the APK. I can implement this functionality on a future release if people would find it useful.

And that’s it! I hope you found this write up useful. If the content has changed since writing this, don’t hesitate to reach out and correct me.