Papers

created: Sun, 12 Oct 2025 19:45:23 GMT, modified: Mon, 20 Oct 2025 10:12:43 GMT

Collection of AI papers.

  • https://arxiv.org/abs/2510.04618
    • Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
    • Framework for self-improving LLMs through evolving contexts rather than fine-tuning
      • Uses three agentic roles (Generator, Reflector, Curator) to accumulate, refine, and organize strategies through incremental delta updates
      • Prevents context collapse while preserving detailed knowledge
      • Achieves +10.6% improvement on agent benchmarks and +8.6% on finance tasks
  • https://arxiv.org/abs/2510.09580
    • GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data
    • Neurosymbolic model that distills high-quality knowledge graphs from unstructured text
      • 80M-parameter GraphMERT achieves 69.8% FActScore on PubMed diabetes papers, significantly outperforming a 32B-parameter LLM baseline (40.2%)
      • Creates factual, ontology-consistent KGs with provenance for verifiable reasoning
  • https://arxiv.org/abs/2510.11701
    • Demystifying Reinforcement Learning in Agentic Reasoning
    • Comprehensive investigation of RL in agentic reasoning covering data, algorithms, and reasoning modes
      • Real end-to-end tool-use trajectories yield stronger SFT initialization than stitched synthetic ones
      • Exploration-friendly techniques (clip higher, reward shaping, policy entropy) improve training efficiency
      • Deliberative strategy with fewer tool calls outperforms frequent calls or verbose self-reasoning

Small Models and Fine-Tuning

The MoM architecture is going to move past monolith LLMs to a constellation of SLM (Small Language Models), TRM (Tiny Recursive Models), and ML.

  • https://arxiv.org/abs/2506.02153
    • Small Language Models are the Future of Agentic AI
    • Argues that SLMs are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems
      • While LLMs excel at diverse tasks, agentic AI systems often need specialized models performing repetitive tasks with little variation
      • SLMs are the optimal choice based on current capabilities, common agentic architectures, and deployment economics
  • https://arxiv.org/abs/2508.06813
    • Full-Stack Fine-Tuning for the Q Programming Language
    • Open-source approach for adapting LLMs to Q programming language for quantitative finance
      • Introduces Leetcode-style evaluation dataset for Q
      • Trains suite of reasoning and non-reasoning models based on Qwen-2.5 series (1.5B to 32B parameters) using pretraining, SFT, and RL
  • https://arxiv.org/abs/2409.06446
    • HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data
    • Novel approach to enhance LLM secure code generation by automatically synthesizing pairs of vulnerable and fixed codes for specific CWE types
      • Uses oracle-guided data synthesis pipeline
      • Two-step process for secure code generation, reducing security vulnerabilities in generated code
  • https://arxiv.org/abs/2509.25716
    • DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation
    • Real-time API retrieval system for context-aware code generation
      • Uses retrieval-augmented generation, query enhancement, and fine-tuned embedding models
      • Improves enterprise code completion with specialized reranking and RL-based optimization
  • https://ninetyfive.gg/
    • NinetyFive
    • Fast code completion with custom models trained on your codebase
      • Optimized inference engine provides 50ms median latency autocomplete suggestions tailored to project-specific patterns
  • https://icml.cc/virtual/2025/46781
    • Building Production Ready Agentic Systems
    • ICML 2025 Expo Talk on Shopify's production agentic systems
      • Covers architecture, LLM-based evaluation, GRPO training, dataset curation, tooling, MCP
      • Post-training techniques (SFT and RL), prompting, structured generation through CFG, and agent evaluation
  • https://inference.net/case-study/cal-ai
    • Cal AI Case Study - Inference.net
    • Cal AI reduced latency by 3x and improved reliability using Inference.net's specialized models
      • Custom task-specific models deliver high accuracy at lower cost by removing unnecessary parameters
      • Cutting end-to-end latency by over 50%
  • https://shopify.engineering/leveraging-multimodal-llms
    • Leveraging Multimodal LLMs for Shopify's Global Catalogue
    • Shopify's system makes 40M multimodal LLM inferences daily processing product updates across multiple languages
      • Fine-tuned open-source vision language models with selective field extraction
      • Achieve 500ms median latency while reducing GPU usage by 40%
  • https://huggingface.co/vandijklab/C2S-Scale-Gemma-2-27B
    • C2S-Scale-Gemma-2-27B - Single-Cell Biology LLM
    • Gemma-2 27B fine-tuned for single-cell biology
      • Trained on 800+ datasets (57M cells) from CellxGene and Human Cell Atlas
      • Converts scRNA-seq data into "cell sentences" for biological analysis
      • Discovered interferon-conditional amplifier in virtual screen, confirmed in wet-lab tests
  • https://blog.picnic.nl/adding-eyes-to-picnics-automated-warehouses-part-2-b283dd7f7de6
    • Adding Eyes to Picnic's Automated Warehouses Part 2
    • Production multimodal LLM system processing ~1M images daily from 16 warehouse cameras
      • Optimized FastAPI service on Kubernetes with fine-tuned vision model through LiteLLM gateway
      • Trained on 15k+ labeled images for stock tracking in automated grocery warehouses
  • https://vercel.com/blog/v0-composite-model-family
    • v0 Composite Model Family
    • Composite architecture combining specialized RAG knowledge, reasoning from SOTA LLMs (Sonnet 3.7/4), and custom streaming post-processing
      • Includes Quick Edit model for fast narrow-scope changes and AutoFix model that corrects errors mid-stream
      • Models (sm/md/lg) priced at $0.50/$1.50/$7.50 per million tokens
  • https://tinker-docs.thinkingmachines.ai/
    • Tinker - Distributed LLM Fine-Tuning API
    • Low-level training API that abstracts distributed LLM fine-tuning without hiding controls
      • Write simple loops on CPU-only machines while GPU training is handled automatically
      • Supports LoRA fine-tuning for Qwen, Llama series, and large mixture-of-experts models
  • https://github.com/context-labs/awesome-open-workhorse-models
    • Awesome Open Workhorse Models
    • Curated list of reliable, production-ready open-source models for real-world applications
  • https://fin.ai/research/david-vs-goliath-are-small-llms-any-good/
    • David vs Goliath: Are Small LLMs Any Good?
    • Fin.ai research on fine-tuned 14B models for narrow customer support tasks
      • Fine-tuned small models match larger vendor LLMs on well-scoped tasks while being significantly cheaper per transaction
      • Achieves 60%+ average resolution rate with customer support agents
  • https://deepmind.google/models/gemma/gemmaverse/
    • Gemmaverse - Community Gemma Models
    • Ecosystem of community-created Gemma models and tools
      • Includes multilingual models (Sarvam AI translation, GAIA for Brazilian Portuguese, SEA-LION for 11 Southeast Asian languages)
      • Bulgarian-first BgGPT, and specialized task-specific variants
      • Gemma 3 family offers multimodal understanding and local on-device inference
  • https://venturebeat.com/ai/how-intuit-built-custom-financial-llms-that-cut-latency-50-while-boosting
    • How Intuit Built Custom Financial LLMs
    • Intuit's custom Financial LLMs achieve 50% latency reduction and 5% accuracy improvement compared to general-purpose LLMs on accounting workflows
      • Models understand contextual meaning of financial terminology
      • Power agentic AI in QuickBooks Online and Intuit Enterprise Suite
      • Demonstrates domain specialization advantages over generalization

Papers from 2024