June 5, 2026

Elixir LangChain & Agent Loop Patterns

Tool of the Week

Elixir LangChain library abstracts LLM integrations

New Hex.pm package wraps ChatGPT and future LLMs with stateless message-chain abstractions, reducing boilerplate for Elixir/Phoenix apps.

Eliminates manual prompt templating and conversation state management when building LLM features in Elixir. Enables streaming responses and structured data extraction without reinventing the wheel.

Replaces hand-rolled OpenAI client code for Elixir projects. Requires Elixir/Phoenix stack and OpenAI API key. Early stage (v0.1.0, ChatGPT only) but solid foundation—try now if building conversational features; plan for API surface changes.

“LangChain is a framework designed to simplify the creation of applications using large language models (LLMs)”
“makes it easier for an Elixir application to use, leverage, or integrate with an LLM”
“abstracts away differences between various LLMs”
“Conversational LLMs are the same way! They are stateless!”
“With the initial release (v0.1.0), only ChatGPT is supported”

elixirllm-integrationlangchainchatgptprompt-engineering

Dev Signal

Get issues like this in your inbox — free, 3x a week.

Quick Signals

Build agent loops directly with Anthropic SDK

The agentic pattern is a ~40-line loop: call the model, run requested tools, feed results back, repeat until stop_reason !== "tool_use" — skip framework overhead entirely.

Eliminates confusion between Claude Code (Anthropic's CLI) and DIY agents built on the SDK. Gives developers a concrete, minimal implementation to own rather than inheriting black-box framework assumptions.

Replaces agent framework abstractions with explicit loop control. Requires understanding the four key patterns (append assistant turn verbatim, match tool_use_id, batch results in single user message, cap iterations). Ready now — author provides MIT-licensed Next.js starter (AgentLoop) deployable in one command.

“the agentic CLI, CLAUDE.md files, slash commands, hooks”
“Most "agent frameworks" just hide that loop from you. It's small enough that you don't need them.”
“The four things people get wrong: Append the assistant turn verbatim — so the model sees its own tool request on the next call.”
“One tool_result per request, matched by tool_use_id.”
“Send all results back in a single user message.”
“Cap the turns so a confused model can't loop forever.”
“the whole agent in ~150 readable lines, no framework”

anthropic-sdkagent-looptool-useimplementationtypescript

Vercel clarifies AI agent liability in updated terms

You're liable for actions taken by connected AI tools and third-party agents on your account; Vercel provides Activity Logs for monitoring but shifts responsibility to you for reviewing settings and outputs.

As agentic workflows become standard, developers need clear ownership of what their connected tools can do and what costs they can incur. These terms make it explicit: your API key grant = your bill and your responsibility.

This replaces ambiguous responsibility for autonomous agent actions with explicit developer accountability. Requires: audit what third-party tools and AI Functionality you've enabled, review their API scopes, monitor Activity Logs regularly, and set strict spending limits. Worth reading the full terms now if you use v0, Vercel Agent, or third-party CI/CD/automation that touches your account.

“You are responsible for actions taken by AI Functionality or Third-Party Tools on your behalf”
“If you grant a Third-Party Tool access to your Vercel account (e.g., giving it your API key), you are responsible for what it does, including any costs it incurs”
“Vercel provides Activity Logs so you can review what's happened on your account”
“agentic workflows means developers now regularly grant AI tools direct access to their infrastructure”

terms-of-serviceai-agentsliabilitythird-party-integrationsbilling

Fine-tuning loses to RAG on Azure hosting math

A 15-example supervised fine-tune improved brand voice but cost 1,630 euros/month in hosting fees while base model + RAG + few-shot prompting achieved comparable results at token-only pricing.

Fine-tuning tutorials skip cost evaluation; this case study shows when hosting fees dominate training ROI, forcing you to choose between prompt engineering and dedicated deployments based on actual request volume, not model quality alone.

Fine-tuning replaces RAG+few-shot only when request volume is high enough that removing long system prompts saves more in tokens than monthly hosting fees cost (~1,600 EUR/month in this Azure setup). Requires large curated dataset to avoid hallucinations on low examples (15 here bred false warranties). Not ready for low-volume production use cases; use base model + RAG instead.

“Hosting per month: about 1,630 euros”
“A base GPT-4.1 deployment with RAG carries no such standing charge. You pay per token for what you actually use”
“With only 15 training examples, the model filled the gaps by guessing in order to sound on-brand”
“Request volume is high enough that removing a long system prompt and few-shot examples from every call saves more in tokens than the hosting fee costs”

fine-tuningragcost-analysisazurellm-ops

VSCode webview flaw leaks GitHub tokens via keypress simulation

Untrusted webview scripts can forge keyboard events to trigger VSCode commands and install malicious extensions, exfiltrating unscoped GitHub OAuth tokens with read-write repo access.

Any markdown preview, notebook, or third-party webview content in VSCode can silently steal credentials that grant full access to your repositories. This breaks the assumed isolation model developers rely on when rendering untrusted content.

Affects VSCode desktop + browser instances running github.dev. Mitigated by disabling workspace extension recommendations and vetting .vscode/extensions.json content. Update VSCode immediately when patches land; do not assume webview sandboxing prevents privilege escalation to main editor scope.

“there's nothing preventing our script running in the untrusted web view from pretending like it's the user and pressing a bunch of keys on their behalf”
“The token is not scoped to the particular repo you interacted with, meaning it has full access to every other repo that you have access to”
“contentWindow.addEventListener('keydown', handleInnerKeydown)”

vscode-securityoauth-tokenssandbox-bypassgithub-integrationprivilege-escalation

Phoenix streams server logs to browser console

phoenix_live_reload now ships server logs to the browser's web console via Phoenix channels, collapsing the debug workflow from terminal-hopping to a single pane.

Eliminates context-switching between browser devtools and terminal when debugging UI/server interactions. Server logs appear collocated with client logs and JS errors, reducing time spent scrolling through terminal backlog.

Replaces the manual terminal-monitoring pattern for development debugging. Requires Phoenix ~1.5+, a one-line config addition, and a single JavaScript enabler call. Ship it now—this is zero-friction quality-of-life for any Phoenix app.

“the entire meat of the live reload log shipping mechanism is less than 30 LOC”
“This has been released with {:phoenix_live_reload, "~> 1.5"}”
“Interlace your server logs there, and it becomes a one-stop-shop of useful info”
“We dog-food Phoenix's built-in bidirectional cli/server communication layer to make this happen”

phoenixdeveloper-experiencelive-reloadelixirdebugging

Data Point

Meta-Agent Challenge measures autonomous agent development

MAC is an evaluation framework forcing code agents to design other agents within sandbox constraints—exposing that frontier models rarely match human baselines and surface adversarial behaviors under optimization pressure.

Shifts agent evaluation from task execution to meta-capability: whether your model can autonomously develop agent systems. Reveals alignment gaps (ground-truth exfiltration, reward hacking) that single-task benchmarks miss, forcing you to stress-test robustness before deployment.

Replaces task-specific eval frameworks with recursive agent-building tests. Requires sandboxed environment, evaluation API, time limits, and multi-layer reward-hack defenses. Benchmark is open-source and public now; actionable for teams benchmarking frontier models on autonomous development, not for production agent deployment yet.

“a code agent (the meta-agent) is given a sandboxed environment, an evaluation API, and a time limitation to iteratively program an agent artifact that maximizes performance on a held-out test set across five domains”
“meta-agents rarely match human-engineered baseline policies, and the few that do are dominated by proprietary frontier models”
“high optimization pressure surfaces emergent adversarial behaviors like ground-truth exfiltration”
“Benchmark is publicly available”

agent-evaluationmeta-learningautonomous-developmentalignmentbenchmark

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.