State of AI
created: Sun, 23 Nov 2025 18:25:10 GMT
Models
- model is as good as its training data, garbage in - garbage out
- what exactly is the data SOTA models are trained on?
- feedback loops - trained on use
- training and fine-tuning is expensive and time-consuming
- curated and labelled data set: how big, how good and what is good?
- models makes the biggest impact on outcome "quality"
- gpt 5.1 providing feedback and observations to sonnet 4.5 coding agent
- context engineering effect is limited naturally by the model capabilities
- no matter how good the context is gpt-3.5 is not going to deliver
- one shot requests is a baseline
- fine-tuned and custom models do not have this limitation
- model encodes a certain view of how "good" looks like, per knowledge domain
- one-shot generation of python script
Agency
- some workflows require agency and creativeness, some need to be strict
- a real life workflows flows is typically a combination of these two
- e.g, filling forms is a combination of two kind of activities - pre-defined flow and sources
- models prefer tools they are trained on, and struggling with custom and unfamiliar tools
- adjust tools to their familiarity
- trained on shell scripts and cli tools
- agency quality degrades with context size growth
- instruction following can be trained and measured
Audience
- adopters, bystanders, deniers - a bell curve
- change diffusion via early adopters
Agentic Architecture
- dynamic runtime for agent
- shell executor with standard and custom domain-specific tools
- security and sandboxing on the system level, e.g. network/file access
- agent, a dynamic context builder, solves the problem using tools in runtime
- embedded feedback loops
- evaluate outcomes to improve agents prompts
Limitations
- output variability and inconsistency
- maintaining multi-turn context window: size, relevance