Sitemap

r2ai with lmstudio and gpt-oss

4 min readAug 19, 2025

Background: radare2, nickname “r2”, is an awesome open source disassembler. r2ai is an open source plugin for r2 to communicate with an AI.

Context: malware analysis with r2

Disclaimer: the following on my preferred models and setup are my personal opinion (not my employers) and they only apply to the context of malware analysis. Results would probably be different for another context like text processing.

My setup to analyze malware with r2 and AI

So far, I have been using r2ai with models that run on a remote, third party server. For example, I have acquired an Anthropic API key. The key is provided to r2ai in ~/.r2ai.anthropic-key, my r2ai api is configured to anthropic (r2ai -e api=anthropic) and the model to Claude Sonnet 3.7(r2ai -e model=claude-3–7-sonnet-20250219). Whenever I issue a request (r2ai -d or r2ai -a), my question is sent to Anthropic servers, a few cents are discounted from my account and I get the response back.

This works fine, it’s the best setup I’ve come up with to analyze malware, with a few variations on the model of Claude Sonnet.

Cost and confidentiality

However, this setup is not perfect for all situations:

  1. Cost. The API access to Claude Sonnet is not free. It’s not very expensive, but still, it’s a burden.
  2. Third party. My questions and their context are sent to a third party. As it’s a malware, there’s usually no confidentiality issue about the executable 😉. I might be slightly more concerned about telling the third party how I perform my analysis. This is close to “Intellectual Property”. Fortunately, as I’m a researcher, usually I present how I do things, so it’s not really an issue either. Still, it would be preferable to have full control.

The cost issue can be solved by using a free model. There are many, like mistral’s devstral-small-2505, or codestral-latest. Those free models aren’t half as good as Claude Sonnet in my context, but they are still helpful when I know the question not to be too complex.

Ollama vs LM Studio

The third party issue solves by running our own LLM server, with something like Ollama, or LM Studio.

For malware analysis, we need good reasoning models and support for tools (MCP, or r2ai auto mode). The result usually isn’t satisfying on a regular desktop/laptop, I got my hands a powerful host with 2 NVIDIA AD104GL GPUs, 300+G of RAM and 160T hard disk 😃. Install Debian + CUDA drivers on it, and we’re ready to run Ollama or LM Studio.

Ollama is a bit slow, and LM Studio has the advantage of being faster, easier to configure and its model library contains gpt-oss (free).

r2ai and LM Studio

I start the LM Studio server, and make sure it is accessible remotely to my work host (“serve on local network”).

My LM Studio server is on a dedicated host, in a lab, separated from my work host. So, I need to serve it on “local network”.

In r2ai, to address a given server, we must set the baseurl config in r2ai (r2ai -e baseurl=http://IPADDRESS:PORT). Except I had to fix a minor bug in r2ai, because so far using the baseurl meant using a URL with Ollama’s format (“api” suffix), which is different from the OpenAI format (“v1” suffix).

Then, as I said, the api is openai: r2ai -e api=openai. And we’re nearly done. To list available models on LM Studio: r2ai -e model=?, and then select one.

Demo

In the following video, I use gpt-oss-20b from a LM Studio server with r2, over a malicious sample of Linux/Trigona. The family is a well known ransomware, which originated on Windows and was later ported on Linux. This precise sample was detected in April 2025, and comes with a few variants (by the way, I’ll discuss this at Barb’hack in 2 weeks).

This video has no sound. Just watch it and read the full explanation below

I intentionally chose a difficult sample so that we bump in real issues. On simple binaries, everything works smoother, but hey, when binaries are simple, I don’t need AI to help me out 😉.

In the video, you’ll notice the first issue we encounter is that our context is too big. The main of the malware is too complex, produces too many tokens, far beyond the limit and can’t be processed. We overcome the issue by increasing context length for the model in LM Studio. Note that the bigger the context, the slower the model + need for more RAM.

Second issue: even with a bigger context, the model still fails to answer. We’ve got to work around and ask to decompile only parts of the function. Currently, in r2ai, there isn’t any function to ask to decompile a range of addresses (hopefully, this will be implemented one day). And r2ai -d decompiles an entire function. The trick is to call r2ai -a (auto mode) and tell it manually to decompile a portion of the main.

Third issue: in the auto mode, the model may send r2 commands to run. I don’t know exactly why, but at some point the model got a r2 command wrong (the one to decompile a given number of instructions). So, I had to fix the command, but actually, that’s a perfect illustration of why user must review and edit commands.

It works! We get our decompiled output, which is all I wanted to show in this video: use of LM Studio with r2ai and how to tweak parameters for a model.

In terms of quality, the generated code could be better. That’s because I used gpt-oss. With Claude Sonnet 3.7+, I obtain better quality. But Claude Sonnet is only accessible through Anthropic servers, not via LM Studio, so it’s a different topic.

Hope you enjoyed reading this, and hope to see some of you at Barb’hack for my talk on r2ai!

— Cryptax

--

--

@cryptax
@cryptax

Written by @cryptax

Mobile and IoT malware researcher. The postings on this account are solely my own opinion and do not represent my employer.

No responses yet