I found that asking the assistant to evaluate how testable a piece of code is on a scale of 1-10 is a good proxy for overall code quality.
On a scale of 1-10, how testable is this code?