Ask HN: How do you prompt the "advanced" models

18 points · _jogicodes_ · 12 days ago

I use the Windsurf IDE, which comes with integrated LLM chat and edit functionality. Ever since I switched to it two months ago and for the three months before that I was using Cursor (similar editor), I have always had better results with Claude.

With the apparently more advanced reasoning models I thought that that would change. In Windsurf I have DeepSeek R1 as well as o3-mini available. I had thought that they would improve my outcomes to the prompts that I'm giving. They did not, far from it. Even though in benchmarks they consistently pull ahead of Claude 3.5 Sonnet, in reality, with the way I am prompting, Claude almost always comes up with the better solution. So much so, that I can't remember a time where Claude couldn't figure it out and then switching to another model fixed it for me.

Because of the discrepancy between benchmarks and my own experience I am wondering if my prompting is off. It may be that I am prompting Claude specific having used it for a while now. Is there a trick to know to prompt the reasoning models "properly"?


12 comments
cruffle_duffle · 12 days ago
I think those benchmarks are all noise. Claude has so far been the only model I really trust and use in Cursor. All those fancy pants reasoning models seek to just jerk themselves off and never really do anything better than Sonnet.

One thing I always make sure is to never get it to just spit out code. I always go back and forth a few times to ensure alignment before I say “Bombs Away” and let it write code.

Show replies

almosthere · 12 days ago
Is editing code really the endpoint for llms?

I suspect we'll be getting to a point where the "code" is just instructions, codified in a special markup file, and llms just write the worst possible, kiss code you can think of - but is extremely secure because it's just like direct database access with all the security constraints you define, but always applied correctly. In other words think of the actual code as a non-committed artifact, and it's just emitted if the descriptors change.

The long term of llms writing code isn't to give us human quality code, it's to give us what we'd think of assembly but rigorously output to all auth requirements.

Show replies

ai-christianson · 12 days ago
In Windsurf (and a similar FOSS tool, RA.Aid that I develop), sonnet is almost always the best model to drive the agent itself. The reasoning models really shine when you have some kind of logic problem, planning problem, debugging complex code, etc. That's why we have our agent call out to the reasoning model just for when it needs to "ask the expert" something. It works fairly well.

Show replies

danbmil99 · 12 days ago
Claude represents some sort of inflection point or phase change. With all the noise, Claude Sonnet is by far the most impressive model since gpt4.
KTibow · 12 days ago
You should only use reasoning models if your prompt needs reasoning (eg debugging a weird error or writing an optimized algorithm).