Skip to content

feat(luau): expose native breakpoint and step debug API#705

Open
w4nderlust wants to merge 2 commits into
mlua-rs:mainfrom
w4nderlust:feat/luau-debug-api
Open

feat(luau): expose native breakpoint and step debug API#705
w4nderlust wants to merge 2 commits into
mlua-rs:mainfrom
w4nderlust:feat/luau-debug-api

Conversation

@w4nderlust
Copy link
Copy Markdown

On the Luau backend there is currently no way to do line-precise debugging from mlua. Lua::set_hook is #[cfg(not(feature = "luau"))], set_interrupt only fires at coarse safepoints (calls, loop back-edges, returns) so it cannot stop on an arbitrary line, and the Luau sandbox strips debug.getlocal/getfenv so even when you do pause you cannot read locals.

Luau's C API already has everything needed for this, mlua just had not wrapped it. This PR adds safe wrappers following the existing set_interrupt pattern. No mlua-sys changes, the ffi was already declared.

New surface, all behind the luau feature:

  • Lua::set_debug_break / set_debug_step (plus removers) and set_single_step. The callbacks receive a &Debug for the paused frame and can return VmState::Yield to suspend the running coroutine, same semantics as set_interrupt.
  • Function::set_breakpoint(line, enabled) -> Option<u32>, wrapping lua_breakpoint. Returns the line Luau actually placed it on (it snaps the breakpoint to the next executable line).
  • Debug::get_local / set_local / locals, wrapping lua_getlocal/lua_setlocal. Luau keeps locals reachable here even though the sandbox removes the debug library equivalents.

Two things worth knowing, both called out in the docs/comments:

  • Luau re-evaluates a breakpoint when the coroutine is resumed, so the break callback fires again on the same line. The expected pattern is to yield on the first hit and return VmState::Continue on resume to step past it. The tests exercise this.
  • The break/step callbacks cannot go through callback_error_ext. It reserves its WrappedFailure userdata at the base of the running frame (lua_insert(state, 1)), which shifts the paused function's registers up by one and makes lua_getlocal read the wrong slots. So there is a small dedicated debug_callback trampoline that reserves at the top of the stack instead, keeping the same error/panic/yield handling.

Tests in tests/luau.rs cover breakpoint pause with yield/resume in a coroutine, a breakpoint on a multi-line call expression, single-step over consecutive lines and disabling it, reading and writing locals at a breakpoint, and error propagation from a break callback. The chunks are compiled with optimization 0 / debug 2 so line and local mapping stays stable.

Wrap Luau's debug C API so embedders can build line-precise debuggers:
lua_breakpoint, lua_singlestep, lua_getlocal/lua_setlocal, and the
debugbreak/debugstep callbacks. The callbacks mirror set_interrupt and can
return VmState::Yield to suspend the running coroutine.

New surface (all gated on the luau feature):
- Lua::set_debug_break / set_debug_step / set_single_step and their removers
- Function::set_breakpoint
- Debug::get_local / set_local / locals

The debug callbacks need their own trampoline instead of callback_error_ext:
the latter reserves its WrappedFailure at the running frame's base, which
shifts the paused function's registers and makes lua_getlocal read the wrong
slots. debug_callback reserves at the top of the stack instead.
Copy link
Copy Markdown
Member

@khvzak khvzak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks LLM generated

Comment thread src/debug.rs Outdated
Comment on lines +236 to +247
pub fn set_local(&self, index: usize, value: Value) -> Result<bool> {
unsafe {
// `lua_setlocal` may leave the value on the stack when `index` is out of range; the
// `StackGuard` restores the top in every case.
let _sg = StackGuard::new(self.state);
assert_stack(self.state, 1);

self.lua.push_value(&value)?;
let name = ffi::lua_setlocal(self.state, self.level, index as c_int);
Ok(!name.is_null())
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe operation. Some code can expect that "local" variable has a specific type (and omit checks). Breaking this invariant can be dangerous.

Change set_local return type from Result<bool> to Result<Option<String>>,
returning the local's name on success and None when the index is out of
range or the frame is native-compiled (Luau guards writes to native frames
via LUA_CALLINFO_NATIVE; for interpreted frames a type mismatch surfaces as
a Lua RuntimeError, not UB).

This makes set_local symmetric with get_local and faithful to what the C
API returns. The doc comment now explains both the native-frame guard and
the interpreted-frame behaviour explicitly.

Add three tests: out-of-range index returns None, type-mismatched write
succeeds at the API level then raises a Lua RuntimeError on next VM use,
and same-type write returns Ok(Some(name)).
@w4nderlust
Copy link
Copy Markdown
Author

This is AI-aided, not fully AI-generated. Let me know if that's a problem, I didn't see anything about it in CONTRIBUTING.md (and couldn't find one actually).

On the safety question: you're right that the original API was poorly designed and the doc comment left too much unsaid. I looked at how Luau's own lua_setlocal handles this internally and the key thing is that it already guards against native frames via the LUA_CALLINFO_NATIVE flag, refusing the write and returning null. For interpreted frames, a type mismatch doesn't cause any memory unsafety, it just surfaces as a Lua RuntimeError the next time the VM touches that register, which is the same behaviour you'd get from debug.setlocal in standard Lua. So unsafe fn would be semantically wrong here since there's no way to cause UB through this API.

What I changed: the return type is now Result<Option<String>> instead of Result<bool>, returning the local's name on success and None when the index is out of range or the frame is native-compiled. This makes it symmetric with get_local and faithful to what the C API actually returns. The doc comment now explains the native frame guard and the interpreted frame behaviour explicitly. I also added tests for the out-of-range case and specifically for a type-mismatched write, which confirms the VM raises a RuntimeError rather than anything worse.

That said, I'm genuinely happy to restructure this however you think fits best in the codebase, just point me in the direction you'd prefer and I'll make it happen :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants