Papers
created: Sun, 12 Oct 2025 19:45:23 GMT, modified: Mon, 20 Oct 2025 10:12:43 GMT
Collection of AI papers.
- https://arxiv.org/abs/2510.04618
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- Framework for self-improving LLMs through evolving contexts rather than fine-tuning
- Uses three agentic roles (Generator, Reflector, Curator) to accumulate, refine, and organize strategies through incremental delta updates
- Prevents context collapse while preserving detailed knowledge
- Achieves +10.6% improvement on agent benchmarks and +8.6% on finance tasks
- https://arxiv.org/abs/2510.09580
- GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data
- Neurosymbolic model that distills high-quality knowledge graphs from unstructured text
- 80M-parameter GraphMERT achieves 69.8% FActScore on PubMed diabetes papers, significantly outperforming a 32B-parameter LLM baseline (40.2%)
- Creates factual, ontology-consistent KGs with provenance for verifiable reasoning
- https://arxiv.org/abs/2510.11701
- Demystifying Reinforcement Learning in Agentic Reasoning
- Comprehensive investigation of RL in agentic reasoning covering data, algorithms, and reasoning modes
- Real end-to-end tool-use trajectories yield stronger SFT initialization than stitched synthetic ones
- Exploration-friendly techniques (clip higher, reward shaping, policy entropy) improve training efficiency
- Deliberative strategy with fewer tool calls outperforms frequent calls or verbose self-reasoning
Small Models and Fine-Tuning
The MoM architecture is going to move past monolith LLMs to a constellation of SLM (Small Language Models), TRM (Tiny Recursive Models), and ML.
- https://arxiv.org/abs/2506.02153
- Small Language Models are the Future of Agentic AI
- Argues that SLMs are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems
- While LLMs excel at diverse tasks, agentic AI systems often need specialized models performing repetitive tasks with little variation
- SLMs are the optimal choice based on current capabilities, common agentic architectures, and deployment economics
- https://arxiv.org/abs/2508.06813
- Full-Stack Fine-Tuning for the Q Programming Language
- Open-source approach for adapting LLMs to Q programming language for quantitative finance
- Introduces Leetcode-style evaluation dataset for Q
- Trains suite of reasoning and non-reasoning models based on Qwen-2.5 series (1.5B to 32B parameters) using pretraining, SFT, and RL
- https://arxiv.org/abs/2409.06446
- HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data
- Novel approach to enhance LLM secure code generation by automatically synthesizing pairs of vulnerable and fixed codes for specific CWE types
- Uses oracle-guided data synthesis pipeline
- Two-step process for secure code generation, reducing security vulnerabilities in generated code
- https://arxiv.org/abs/2509.25716
- DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation
- Real-time API retrieval system for context-aware code generation
- Uses retrieval-augmented generation, query enhancement, and fine-tuned embedding models
- Improves enterprise code completion with specialized reranking and RL-based optimization
- https://ninetyfive.gg/
- NinetyFive
- Fast code completion with custom models trained on your codebase
- Optimized inference engine provides 50ms median latency autocomplete suggestions tailored to project-specific patterns
- https://icml.cc/virtual/2025/46781
- Building Production Ready Agentic Systems
- ICML 2025 Expo Talk on Shopify's production agentic systems
- Covers architecture, LLM-based evaluation, GRPO training, dataset curation, tooling, MCP
- Post-training techniques (SFT and RL), prompting, structured generation through CFG, and agent evaluation
- https://inference.net/case-study/cal-ai
- Cal AI Case Study - Inference.net
- Cal AI reduced latency by 3x and improved reliability using Inference.net's specialized models
- Custom task-specific models deliver high accuracy at lower cost by removing unnecessary parameters
- Cutting end-to-end latency by over 50%
- https://shopify.engineering/leveraging-multimodal-llms
- Leveraging Multimodal LLMs for Shopify's Global Catalogue
- Shopify's system makes 40M multimodal LLM inferences daily processing product updates across multiple languages
- Fine-tuned open-source vision language models with selective field extraction
- Achieve 500ms median latency while reducing GPU usage by 40%
- https://huggingface.co/vandijklab/C2S-Scale-Gemma-2-27B
- C2S-Scale-Gemma-2-27B - Single-Cell Biology LLM
- Gemma-2 27B fine-tuned for single-cell biology
- Trained on 800+ datasets (57M cells) from CellxGene and Human Cell Atlas
- Converts scRNA-seq data into "cell sentences" for biological analysis
- Discovered interferon-conditional amplifier in virtual screen, confirmed in wet-lab tests
- https://blog.picnic.nl/adding-eyes-to-picnics-automated-warehouses-part-2-b283dd7f7de6
- Adding Eyes to Picnic's Automated Warehouses Part 2
- Production multimodal LLM system processing ~1M images daily from 16 warehouse cameras
- Optimized FastAPI service on Kubernetes with fine-tuned vision model through LiteLLM gateway
- Trained on 15k+ labeled images for stock tracking in automated grocery warehouses
- https://vercel.com/blog/v0-composite-model-family
- v0 Composite Model Family
- Composite architecture combining specialized RAG knowledge, reasoning from SOTA LLMs (Sonnet 3.7/4), and custom streaming post-processing
- Includes Quick Edit model for fast narrow-scope changes and AutoFix model that corrects errors mid-stream
- Models (sm/md/lg) priced at $0.50/$1.50/$7.50 per million tokens
- https://tinker-docs.thinkingmachines.ai/
- Tinker - Distributed LLM Fine-Tuning API
- Low-level training API that abstracts distributed LLM fine-tuning without hiding controls
- Write simple loops on CPU-only machines while GPU training is handled automatically
- Supports LoRA fine-tuning for Qwen, Llama series, and large mixture-of-experts models
- https://github.com/context-labs/awesome-open-workhorse-models
- Awesome Open Workhorse Models
- Curated list of reliable, production-ready open-source models for real-world applications
- https://fin.ai/research/david-vs-goliath-are-small-llms-any-good/
- David vs Goliath: Are Small LLMs Any Good?
- Fin.ai research on fine-tuned 14B models for narrow customer support tasks
- Fine-tuned small models match larger vendor LLMs on well-scoped tasks while being significantly cheaper per transaction
- Achieves 60%+ average resolution rate with customer support agents
- https://deepmind.google/models/gemma/gemmaverse/
- Gemmaverse - Community Gemma Models
- Ecosystem of community-created Gemma models and tools
- Includes multilingual models (Sarvam AI translation, GAIA for Brazilian Portuguese, SEA-LION for 11 Southeast Asian languages)
- Bulgarian-first BgGPT, and specialized task-specific variants
- Gemma 3 family offers multimodal understanding and local on-device inference
- https://venturebeat.com/ai/how-intuit-built-custom-financial-llms-that-cut-latency-50-while-boosting
- How Intuit Built Custom Financial LLMs
- Intuit's custom Financial LLMs achieve 50% latency reduction and 5% accuracy improvement compared to general-purpose LLMs on accounting workflows
- Models understand contextual meaning of financial terminology
- Power agentic AI in QuickBooks Online and Intuit Enterprise Suite
- Demonstrates domain specialization advantages over generalization
Papers from 2024
- https://arxiv.org/html/2410.18050v2
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
- https://arxiv.org/abs/2408.14717
- We propose Table-Augmented Generation (TAG), a unified and general-purpose paradigm for answering natural language questions over databases.
- https://github.com/TAG-Research/lotus
- https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
- GraphRAG introduction
- https://blog.cubed.run/the-insanity-of-relying-on-vector-embeddings-why-rag-fails-be73554490b2
- The Insanity of Relying on Vector Embeddings: Why RAG Fails
- https://www.databricks.com/blog/long-context-rag-capabilities-openai-o1-and-google-gemini
- The Long Context RAG Capabilities of OpenAI o1 and Google Gemini
- OpenAI o1 models show a consistent improvement over Anthropic and Google models on our long context RAG Benchmark up to 128k tokens.
- Despite lower performance than the SOTA OpenAI and Anthropic models, Google Gemini 1.5 models have consistent RAG performance at extreme context lengths of up to 2 million tokens.
- Models fail on long context RAG in highly distinct ways
- The Long Context RAG Capabilities of OpenAI o1 and Google Gemini