State of AI

created: Sun, 23 Nov 2025 18:25:10 GMT

Models

  • model is as good as its training data, garbage in - garbage out
    • what exactly is the data SOTA models are trained on?
    • feedback loops - trained on use
  • training and fine-tuning is expensive and time-consuming
    • curated and labelled data set: how big, how good and what is good?
  • models makes the biggest impact on outcome "quality"
    • gpt 5.1 providing feedback and observations to sonnet 4.5 coding agent
  • context engineering effect is limited naturally by the model capabilities
    • no matter how good the context is gpt-3.5 is not going to deliver
    • one shot requests is a baseline
  • fine-tuned and custom models do not have this limitation
  • model encodes a certain view of how "good" looks like, per knowledge domain
    • one-shot generation of python script

Agency

  • some workflows require agency and creativeness, some need to be strict
    • a real life workflows flows is typically a combination of these two
    • e.g, filling forms is a combination of two kind of activities - pre-defined flow and sources
  • models prefer tools they are trained on, and struggling with custom and unfamiliar tools
    • adjust tools to their familiarity
    • trained on shell scripts and cli tools
  • agency quality degrades with context size growth
  • instruction following can be trained and measured

Audience

  • adopters, bystanders, deniers - a bell curve
  • change diffusion via early adopters

Agentic Architecture

  • dynamic runtime for agent
    • shell executor with standard and custom domain-specific tools
    • security and sandboxing on the system level, e.g. network/file access
  • agent, a dynamic context builder, solves the problem using tools in runtime
  • embedded feedback loops
    • evaluate outcomes to improve agents prompts

Limitations

  • output variability and inconsistency
  • maintaining multi-turn context window: size, relevance