VidGuard‑R1 introduced as first MLLM‑based video detector – The research team unveiled VidGuard‑R1, a video authenticity detector that fine‑tunes a multi‑modal large language model (MLLM) using group relative policy optimization (GRPO) to classify AI‑generated videos while providing transparent explanations for regulators and users [1].
Dataset comprises 140 000 real and AI‑generated videos – To train and evaluate the system, the authors curated a challenging dataset of 140 k videos, mixing authentic footage with content produced by state‑of‑the‑art generation models and deliberately engineered to maximize discrimination difficulty [1].
Fine‑tuning performed on Qwen‑VL with dual reward models – The team fine‑tuned the Qwen‑VL MLLM using GRPO, employing two specialized reward models that target temporal artifacts and generation complexity, thereby guiding the model toward both accurate detection and reasoning capabilities [1].
Zero‑shot performance surpasses existing benchmarks – Experimental results show VidGuard‑R1 achieves state‑of‑the‑art zero‑shot performance on established video authenticity benchmarks, indicating its ability to generalize without task‑specific training data [1].
Training pushes accuracy above 95 % – Additional supervised training on the curated dataset raises overall detection accuracy to over 95 %, marking a significant improvement over prior methods [1].
Model generates interpretable rationales for predictions – Case studies demonstrate that VidGuard‑R1 not only classifies videos correctly but also produces precise, human‑readable explanations that reveal the reasoning behind each judgment, supporting transparency requirements [1].