Main Server
Mirror Server
: Qwen2.5-VL-72B-Instruct is used as the judge model for calculating visual rewards during training [11]. 4. Experimental Results
: The model is tested on subsets ranging from 200k to 2.8 million samples.
) to ensure the generated code matches the visual intent [11].