OpenChat-7B uses C-RLFT (Conditioned Reinforcement Learning Fine-Tuning) to enhance a 7B parameter base model. This innovative approach trains on mixed-quality data without requiring preference labels.
anthropic