Papers

created: Sun, 12 Oct 2025 19:45:23 GMT, modified: Mon, 20 Oct 2025 10:12:43 GMT

Collection of AI papers.

https://arxiv.org/abs/2510.04618
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- Framework for self-improving LLMs through evolving contexts rather than fine-tuning
  - Uses three agentic roles (Generator, Reflector, Curator) to accumulate, refine, and organize strategies through incremental delta updates
  - Prevents context collapse while preserving detailed knowledge
  - Achieves +10.6% improvement on agent benchmarks and +8.6% on finance tasks
https://arxiv.org/abs/2510.09580
- GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data
- Neurosymbolic model that distills high-quality knowledge graphs from unstructured text
  - 80M-parameter GraphMERT achieves 69.8% FActScore on PubMed diabetes papers, significantly outperforming a 32B-parameter LLM baseline (40.2%)
  - Creates factual, ontology-consistent KGs with provenance for verifiable reasoning
https://arxiv.org/abs/2510.11701
- Demystifying Reinforcement Learning in Agentic Reasoning
- Comprehensive investigation of RL in agentic reasoning covering data, algorithms, and reasoning modes
  - Real end-to-end tool-use trajectories yield stronger SFT initialization than stitched synthetic ones
  - Exploration-friendly techniques (clip higher, reward shaping, policy entropy) improve training efficiency
  - Deliberative strategy with fewer tool calls outperforms frequent calls or verbose self-reasoning

Small Models and Fine-Tuning

The MoM architecture is going to move past monolith LLMs to a constellation of SLM (Small Language Models), TRM (Tiny Recursive Models), and ML.

https://arxiv.org/abs/2506.02153
- Small Language Models are the Future of Agentic AI
- Argues that SLMs are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems
  - While LLMs excel at diverse tasks, agentic AI systems often need specialized models performing repetitive tasks with little variation
  - SLMs are the optimal choice based on current capabilities, common agentic architectures, and deployment economics
https://arxiv.org/abs/2508.06813
- Full-Stack Fine-Tuning for the Q Programming Language
- Open-source approach for adapting LLMs to Q programming language for quantitative finance
  - Introduces Leetcode-style evaluation dataset for Q
  - Trains suite of reasoning and non-reasoning models based on Qwen-2.5 series (1.5B to 32B parameters) using pretraining, SFT, and RL
https://arxiv.org/abs/2409.06446
- HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data
- Novel approach to enhance LLM secure code generation by automatically synthesizing pairs of vulnerable and fixed codes for specific CWE types
  - Uses oracle-guided data synthesis pipeline
  - Two-step process for secure code generation, reducing security vulnerabilities in generated code
https://arxiv.org/abs/2509.25716
- DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation
- Real-time API retrieval system for context-aware code generation
  - Uses retrieval-augmented generation, query enhancement, and fine-tuned embedding models
  - Improves enterprise code completion with specialized reranking and RL-based optimization
https://ninetyfive.gg/
- NinetyFive
- Fast code completion with custom models trained on your codebase
  - Optimized inference engine provides 50ms median latency autocomplete suggestions tailored to project-specific patterns
https://icml.cc/virtual/2025/46781
- Building Production Ready Agentic Systems
- ICML 2025 Expo Talk on Shopify's production agentic systems
  - Covers architecture, LLM-based evaluation, GRPO training, dataset curation, tooling, MCP
  - Post-training techniques (SFT and RL), prompting, structured generation through CFG, and agent evaluation
https://inference.net/case-study/cal-ai
- Cal AI Case Study - Inference.net
- Cal AI reduced latency by 3x and improved reliability using Inference.net's specialized models
  - Custom task-specific models deliver high accuracy at lower cost by removing unnecessary parameters
  - Cutting end-to-end latency by over 50%
https://shopify.engineering/leveraging-multimodal-llms
- Leveraging Multimodal LLMs for Shopify's Global Catalogue
- Shopify's system makes 40M multimodal LLM inferences daily processing product updates across multiple languages
  - Fine-tuned open-source vision language models with selective field extraction
  - Achieve 500ms median latency while reducing GPU usage by 40%
https://huggingface.co/vandijklab/C2S-Scale-Gemma-2-27B
- C2S-Scale-Gemma-2-27B - Single-Cell Biology LLM
- Gemma-2 27B fine-tuned for single-cell biology
  - Trained on 800+ datasets (57M cells) from CellxGene and Human Cell Atlas
  - Converts scRNA-seq data into "cell sentences" for biological analysis
  - Discovered interferon-conditional amplifier in virtual screen, confirmed in wet-lab tests
https://blog.picnic.nl/adding-eyes-to-picnics-automated-warehouses-part-2-b283dd7f7de6
- Adding Eyes to Picnic's Automated Warehouses Part 2
- Production multimodal LLM system processing ~1M images daily from 16 warehouse cameras
  - Optimized FastAPI service on Kubernetes with fine-tuned vision model through LiteLLM gateway
  - Trained on 15k+ labeled images for stock tracking in automated grocery warehouses
https://vercel.com/blog/v0-composite-model-family
- v0 Composite Model Family
- Composite architecture combining specialized RAG knowledge, reasoning from SOTA LLMs (Sonnet 3.7/4), and custom streaming post-processing
  - Includes Quick Edit model for fast narrow-scope changes and AutoFix model that corrects errors mid-stream
  - Models (sm/md/lg) priced at $0.50/$1.50/$7.50 per million tokens
https://tinker-docs.thinkingmachines.ai/
- Tinker - Distributed LLM Fine-Tuning API
- Low-level training API that abstracts distributed LLM fine-tuning without hiding controls
  - Write simple loops on CPU-only machines while GPU training is handled automatically
  - Supports LoRA fine-tuning for Qwen, Llama series, and large mixture-of-experts models
https://github.com/context-labs/awesome-open-workhorse-models
- Awesome Open Workhorse Models
- Curated list of reliable, production-ready open-source models for real-world applications
https://fin.ai/research/david-vs-goliath-are-small-llms-any-good/
- David vs Goliath: Are Small LLMs Any Good?
- Fin.ai research on fine-tuned 14B models for narrow customer support tasks
  - Fine-tuned small models match larger vendor LLMs on well-scoped tasks while being significantly cheaper per transaction
  - Achieves 60%+ average resolution rate with customer support agents
https://deepmind.google/models/gemma/gemmaverse/
- Gemmaverse - Community Gemma Models
- Ecosystem of community-created Gemma models and tools
  - Includes multilingual models (Sarvam AI translation, GAIA for Brazilian Portuguese, SEA-LION for 11 Southeast Asian languages)
  - Bulgarian-first BgGPT, and specialized task-specific variants
  - Gemma 3 family offers multimodal understanding and local on-device inference
https://venturebeat.com/ai/how-intuit-built-custom-financial-llms-that-cut-latency-50-while-boosting
- How Intuit Built Custom Financial LLMs
- Intuit's custom Financial LLMs achieve 50% latency reduction and 5% accuracy improvement compared to general-purpose LLMs on accounting workflows
  - Models understand contextual meaning of financial terminology
  - Power agentic AI in QuickBooks Online and Intuit Enterprise Suite
  - Demonstrates domain specialization advantages over generalization

Papers from 2024

https://arxiv.org/html/2410.18050v2
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
https://arxiv.org/abs/2408.14717
- We propose Table-Augmented Generation (TAG), a unified and general-purpose paradigm for answering natural language questions over databases.
- https://github.com/TAG-Research/lotus
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
- GraphRAG introduction
https://blog.cubed.run/the-insanity-of-relying-on-vector-embeddings-why-rag-fails-be73554490b2
- The Insanity of Relying on Vector Embeddings: Why RAG Fails
https://www.databricks.com/blog/long-context-rag-capabilities-openai-o1-and-google-gemini
- The Long Context RAG Capabilities of OpenAI o1 and Google Gemini
  - OpenAI o1 models show a consistent improvement over Anthropic and Google models on our long context RAG Benchmark up to 128k tokens.
  - Despite lower performance than the SOTA OpenAI and Anthropic models, Google Gemini 1.5 models have consistent RAG performance at extreme context lengths of up to 2 million tokens.
  - Models fail on long context RAG in highly distinct ways