State of AI

created: Sun, 23 Nov 2025 18:25:10 GMT

Models

model is as good as its training data, garbage in - garbage out
- what exactly is the data SOTA models are trained on?
- feedback loops - trained on use
training and fine-tuning is expensive and time-consuming
- curated and labelled data set: how big, how good and what is good?
models makes the biggest impact on outcome "quality"
- gpt 5.1 providing feedback and observations to sonnet 4.5 coding agent
context engineering effect is limited naturally by the model capabilities
- no matter how good the context is gpt-3.5 is not going to deliver
- one shot requests is a baseline
fine-tuned and custom models do not have this limitation
model encodes a certain view of how "good" looks like, per knowledge domain
- one-shot generation of python script

some workflows require agency and creativeness, some need to be strict
- a real life workflows flows is typically a combination of these two
- e.g, filling forms is a combination of two kind of activities - pre-defined flow and sources
models prefer tools they are trained on, and struggling with custom and unfamiliar tools
- adjust tools to their familiarity
- trained on shell scripts and cli tools
agency quality degrades with context size growth
instruction following can be trained and measured

dynamic runtime for agent
- shell executor with standard and custom domain-specific tools
- security and sandboxing on the system level, e.g. network/file access
agent, a dynamic context builder, solves the problem using tools in runtime
embedded feedback loops
- evaluate outcomes to improve agents prompts