Building Effective "Agents"
https://www.anthropic.com/research/building-effective-agentsThere's also a cookbook with useful code examples: https://github.com/anthropics/anthropic-cookbook/tree/main/p...
Blogged about this here: https://simonwillison.net/2024/Dec/20/building-effective-age...
They also say that using LangChain and other frameworks is mostly unnecessary and does more harm than good. They instead argue to use some simple patterns, directly on the API level. Not dis-similar to the old-school Gang of Four software engineering patterns.
Really like this post as a guidance for how to actually build useful tools with LLMs. Keep it simple, stupid.
It would decide what circumstances call for double-checking facts for accuracy, which would hopefully catch hallucinations. It would write its own acceptance criteria for its answers, etc.
It's not clear to me how to train each of the sub-models required, or how big (or small!) they need to be, or what architecture works best. But I think that complex architectures are going to win out over the "just scale up with more data and more compute" approach.
The questions then become:
1. When can you (i.e. a person who wants to build systems with them) trust them to make decisions on their own?
2. What type of trusted environments are we talking about? (Sandboxing?)
So, that all requires more thought -- perhaps by some folks who hang out at this site. :)
I suspect that someone will come up with a "real-world" application at a non-tech-first enterprise company and let us know.
I work on CAAs and document my journey on my substack (https://jdsmerau.substack.com)
* A "network of agents" is a system of agents and tools
* That run and build up state (both "memory" and actual state via tool use)
* Which is then inspected when routing as a kind of "state machine".
* Routing should specify which agent (or agents, in parallel) to run next, via that state.
* Routing can also use other agents (routing agents) to figure out what to do next, instead of code.
We're codifying this with durable workflows in a prototypical library — AgentKit: https://github.com/inngest/agent-kit/ (docs: https://agentkit.inngest.com/overview).
It took less than a day to get a network of agents to correctly fix swebench-lite examples. It's super early, but very fun. One of the cool things is that this uses Inngest under the hood, so you get all of the classic durable execution/step function/tracing/o11y for free, but it's just regular code that you write.
If runtime information is insufficient, we can use AI/ML models to fill that information. But deciding the next step could be done ahead of time assuming complete information.
Most AI agent examples short circuit these two steps. When faced with unstructured or insufficient information, the program asks the LLM/AI model to decide the next step. Instead, we could ask the LLM/AI model to structure/predict necessary information and use pre-defined rules to drive the process.
This approach will translate most [1] "Agent" examples into "Workflow" examples. The quotes here are meant to imply Anthropic's definition of these terms.
[1] I said "most" because there might be continuous world systems (such as real world simulacrum) that will require a very large number of rules and is probably impractical to define each of them. I believe those systems are an exception, not a rule.
I think this is where durable execution shines. By ensuring every step in an async processing workflow is fault-tolerant and durable, even interruptions won't lose progress. For example, in a refund workflow, a durable system can resume exactly where it left off—no duplicate refunds, no lost state.
Agents are Interfaces, Not Implementations
The current zeitgeist seems to think of agents as passthrough agents: e.g. a lite wrapper around a core that's almost 100% a LLM.
The most effective agents I've seen, and have built, are largely traditional software engineering with a sprinkling of LLM calls for "LLM hard" problems. LLM hard problems are problems that can ONLY be solved by application of an LLM (creative writing, text synthesis, intelligent decision making). Leave all the problems that are amenable to decades of software engineering best practice to good old deterministic code.
I've been calling system like this "Transitional Software Design." That is, they're mostly a traditional software application under the hood (deterministic, well structured code, separation of concerns) with judicious use of LLMs where required.
Ultimately, users care about what the agent does, not how it does it.
The biggest differentiator I've seen between agents that work and get adoption, and those that are eternally in a demo phase, is related to the cardinality of the state space the agent is operating in. Too many folks try and "boil the ocean" and try and implement a generic purpose capability: e.g. Generate Python code to do something, or synthesizing SQL based on natural language.
The projects I've seen that work really focus on reducing the state space of agent decision making down to the smallest possible set that delivers user value.
e.g. Rather than generating arbitrary SQL, work out a set of ~20 SQL templates that are hyper-specific to the business problem you're solving. Parameterize them with the options for select, filter, group by, order by, and the subset of aggregate operations that are relevant. Then let the agent chose the right template + parameters from a relatively small finite set of options.
^^^ the delta in agent quality between "boiling the ocean" vs "agent's free choice over a small state space" is night and day. It lets you deploy early, deliver value, and start getting user feedback.
Building Transitional Software Systems:
1. Deeply understand the domain and CUJs,
2. Segment out the system into "problems that traditional software is good at solving" and "LLM-hard problems",
3. For the LLM hard problems, work out the smallest possible state space of decision making,
4. Build the system, and get users using it,
5. Gradually expand the state space as feedback flows in from users.
Divide and conquer me hearties.