Yuxuan Tong [email protected] / [email protected]
All content are mainly based on (publication) first-hand resources: OpenAI posts, works by contributors, public evaluations, reproduction works with similar performance, …
https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/
Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. #
It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. # … The models use these reasoning tokens to "think", breaking down their understanding of the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens, and discards the reasoning tokens from its context. (multi-step conversation between a user and an assistant) #
o1-mini can explore more thought chains compared to o1-preview #