LLMs, Agents, Multi-agents… and now, self-improving ones

It feels like we blinked, and suddenly, we’re transitioning from highly capable Large Language Models (LLMs) that utilise various prompting techniques such as Chain Of Thoughts and diverse tools, to exploring the realms of agent and multi-agent systems. Just when we’re getting a handle on leveraging this immense potential, we’re introduced to tutorials and demos on self-improving agents, pushing the envelope further.

Take, for instance, innovations like CrewAIAutoGen Studio v2Devin from Cognition or LangChain’s Agents to name just a few. They’re reshaping our approach to software development. Let’s dive deeper to get a clearer understanding.

Human and multi-agent team discussion


Cognition Labs introduces Devin, an AI designed as the first fully autonomous software engineer. Devin boasts the ability to plan and execute complex engineering tasks, learn from experience, and fix mistakes. It comes equipped with standard developer tools within a sandboxed environment and can collaborate in real-time, adjusting to feedback and participating in design choices.

Devin’s skills include using new technologies, building and deploying applications, debugging, and enhancing AI models, among others. It demonstrated superior performance on the SWE-bench benchmark, resolving a significant percentage of real-world GitHub issues.

Introducing Devin, the first AI software engineer (Cognition AI)

SWE-Agent from Princeton NLP

In this case, SWE-agent is a tool that integrates language models (like GPT-4) into software engineering processes to resolve issues in real GitHub repositories. Here’s a brief summary of its applications to software development:

  • Agent-Computer Interface (ACI): SWE-agent uses ACI to simplify interactions between the language model and the repository, allowing for efficient browsing, editing, and executing code files.
  • Issue Resolution: On the SWE-bench test set, SWE-agent successfully resolves 12.29% of issues, showcasing state-of-the-art performance.
  • ACI Design Impact: The design of ACI significantly influences the effectiveness of the agent; a well-tuned ACI results in better performance compared to a baseline agent without it.
  • Setup and Usage: The page provides instructions for setting up SWE-agent using Docker and Miniconda, and details on how to use it to generate pull requests that attempt to fix GitHub issues.

Here is the GitHub repo to start testing it.


Picture this: assembling a dream team of AI experts, each bringing a unique superpower to the table. That’s the essence of CrewAI. It’s akin to having a group of brilliant people, each adept in their own domain, working together seamlessly to streamline the software development process.

CrewAI stands out with its innovative framework designed to coordinate role-playing, autonomous AI agents. By focusing on simplicity and a modular design, it breaks down the complex world of AI into manageable components like agents, tools, tasks, and processes. This approach not only demystifies AI but also makes it engaging and accessible.

It provides a robust platform for engineers, offering a developer-friendly framework, tools, and UI to locally build multi-agent automations. Whether you’re employing pre-built models or those from other providers, CrewAI fosters a community where developers can exchange resources, models, and support.

The strength of CrewAI lies in its ability to facilitate team collaboration, organizing multiple intelligent agents into a cohesive unit. This system excels in tasks requiring collaborative effort, enhancing decision-making, creativity, and problem-solving in ways traditional tools can’t match.

AutoGen Studio

AutoGen v2 is at the forefront of AI, especially in leveraging LLMs for complex, automated workflows. This platform makes it easier to orchestrate and optimize LLM workflows, enabling the creation of applications that are innovative, efficient, and impactful.

With AutoGen v2, you get customizable, conversable agents that integrate and communicate flawlessly, powered by advanced LLMs, human insights, or a blend of tools. This adaptability opens up a myriad of applications, from solving complex tasks to facilitating dynamic, conversation-driven interactions.

AutoGen Studio introduces a user-friendly interface to this powerful framework, simplifying rapid prototyping and management of multi-agent systems. Whether it’s configuring an LLM provider or crafting agents and skills, AutoGen Studio streamlines the process, unlocking new possibilities in AI development.

For those keen on innovation, AutoGen Studio v2 (GitHub) mark a significant advancement, inviting collaboration and continual growth in AI applications.

LangChain’s Agents

LangChain revolutionizes the integration of LLMs into applications through “agents.” These aren’t mere scripts but intelligent entities that decide their next move based on the provided context. This approach offers more flexibility and intuition in developing complex AI-driven applications compared to traditional methods.

LangChain offers tools and frameworks like LangGraph, enhancing agent loop control, state tracking, and human-in-the-loop responses. This flexibility allows developers to create agents that can autonomously draft content or require approval, keeping them in command.

Supporting various agent types and providing a comprehensive library of tools, LangChain enables developers to select the optimal cognitive architecture for their applications, ensuring precise outcomes.

For developers aiming to swiftly transition from prototype to production with reliable GenAI applications, LangChain and its tools, such as LangSmith, provide a solid foundation, offering traceability and explainability throughout the development process.

The Era of Self-Improving Agents

Combining the forces of CrewAI, AutoGen Studio v2, and LangChain’s Agents isn’t just about simplification — it’s about revolutionizing software development. This synergy promises unparalleled efficiency, creativity, and flexibility, expanding the horizons of AI in software creation.

And we’re just getting started. The progression to self-improving agents, as showcased in Autogen Studio, opens a realm of new possibilities. These agents learn on the fly, share insights, and evolve, transforming daunting tasks into achievable, more efficient processes.

Unlike traditional agents, these self-improving marvels dynamically learn and adapt without needing direct human coding, enhancing their efficiency and scalability.

The next generation of agents fosters collaborative learning, echoing human team dynamics, and leading to more inventive problem-solving strategies.

Here’s a sneak peek from David Ondrej in his channel.

“Self-Improving Agents are the future, let’s build one”, from David Ondrej

Addressing Ethical and Societal Implications

However, with this powerful technology comes the need for responsible use. As these agents become integral to various sectors, ethical considerations and efficiency checks are crucial.

With the advancement of self-improving agents comes the need for ethical considerations and efficiency optimizations. These agents have the potential to outpace their human counterparts in terms of learning and development speed, raising questions about job displacement, security, and control. Moreover, their efficiency in task completion and problem-solving could lead to unprecedented productivity gains, but also necessitate new frameworks for quality control and accountability.

A few words from Andrew Ng…

In this point, it is worth watching this keynote by Andrew Ng (if you haven’t seen his courses on deeplearning.ai you have to do it) for Sequoia Capital (take a look at their report on Generative AI) on “What’s next for agentic reasoning”. In summary, the spotlight was on the transformative potential of agentic workflows within AI.

These innovative workflows mark a significant shift from traditional, linear approaches, offering a more iterative and dynamic process that mirrors human cognitive strategies. By adopting this method, AI models are capable of planning, drafting, revising, and self-reflecting, thus achieving markedly improved results.

Ng illustrated this with compelling examples, including a case study showcasing how the integration of agentic workflows with AI models like GPT-3.5 can surpass the performance of more advanced models like GPT-4 in specific tasks. This iterative enhancement, facilitated by the agent’s ability to critique and refine its output, underlines the vast potential of agentic workflows to elevate AI’s problem-solving capabilities.

What’s next for agentic reasoning?, Andrew Ng and Sequoia Capital

Furthermore, Ng’s discourse delved into the emergence of distinct design patterns in AI agents, which include reflectionplanningmulti-agent collaboration, and the utilization of external tools. These patterns are crucial for the development of robust, efficient, and versatile AI systems, capable of executing complex tasks with greater autonomy.

Particularly noteworthy is the significance of fast token generation in agentic workflows, which Ng suggests could revolutionize application building in AI by prioritizing speed over accuracy, thereby enabling quicker iterative cycles. This perspective not only sheds light on the current state and advancements in AI but also paves the way for future innovations. As we stand on the brink of these technological leaps, Ng’s insights offer a compelling vision of AI’s capacity to expand its horizons, driving us closer to the realization of artificial general intelligence (AGI).

Industry implementation: Turing bots (Forrester)

What is clear is that the integration of AI agents within the Software Development Lifecycle (SDLC) is revolutionizing how we approach project management and execution. Diego Lo Giudice from Forrester for example published some time ago a series of articles about their Turing Bots that makes intensive use of agents in the different phases of development whether it is generating code or any of the other tasks such as requirements capture.

Companies like G-Research are leading the charge, showcasing remarkable productivity boosts by incorporating generative AI assistants (or TuringBots) into their development processes. This innovative approach not only streamlines workflows but also fosters a culture of continuous improvement and learning.

To demystify the complex workings of these agents for a broader audience, incorporating interactive demonstrations that showcase their capabilities in real-time can be highly effective. For instance, a visual simulation of CrewAI coordinating a multi-agent project or an interactive walkthrough of Devin navigating a coding challenge can provide tangible insights into their operation and advantages. Engaging with these technologies through hands-on experiences not only enhances understanding but also stimulates curiosity for the future of AI.

Next steps

Looking ahead, the integration of self-improving agents promises not only to redefine the landscape of software development but also to spark transformative changes across industries. For businesses eager to harness this potential, the path forward involves a strategic approach to adoption — beginning with pilot projects to assess compatibility and efficacy, followed by a phased integration that allows for continuous learning and adjustment. This roadmap ensures that organizations can leverage the benefits of self-improving agents while remaining agile in the face of evolving technological and ethical considerations.

In any case, we are getting closer and closer to the maxim that was beginning to be glimpsed some time ago about our new role as humans-in-the-loop agent managers:

We will only be as good as the network of agents we can manage.