3 Comments
User's avatar
Doggy FP's avatar

Lot of RL this week

Expand full comment
Christopher's avatar

I really enjoyed the SFR-DeepResearch and LiveMCP-101 papers. My only gripe is neither released working code to make evaluation easier.

This is especially frustrating with LiveMCP-101. imo a benchmark is only useful if it can be used ongoing to test new inputs and this benchmark is timely & would have been very useful if they’d released the code. Really odd they didn’t - again imo.

Expand full comment
Leo C's avatar

ParaThinker aims to solve one of the biggest challenges I find personally with LLM - digging itself into a reasoning rabbit hole, because it’s trying to incorporate its own reasoning tokens or prior conversational output (in multi-turn) in the context.

Expand full comment