Thinking beyond LLMs

Darshan Ponikar

Darshan Ponikar

June 4, 2024

My thoughts on use case with LLMs

We have seen LLMs’ responses dramatically improve; it can solve PhD-level questions, can do research for us, can crawl the internet and make sure to get the best results out there.

I think the biggest breakthrough happened when these AI models understood human language easily. Despite the limitations of LLMs, just understanding and processing plain language and getting responses from that is fascinating.

I am late to the party, but I think LLMs are not going to change the world as I write. LLMs are the medium (the brain), you can say, who observe, analyze, and understand things as an intelligent human, but it doesn’t have the body to perform certain actions.

I was listening to this discussion that happened last year with former Google CEO Eric Schmidt, and he explained the term “AI Agent” in a very simple manner.

Text to Action, that’s it. That’s the simplest way to understand AI agents.

As LLMs can understand plain language, can easily guide you in doing certain things but cannot take any actions by themselves. Hence the AI agent is a wrapper around LLMs that enables LLMs to perform certain actions by exposing APIs or giving LLMs control to perform certain operations and tasks.

AI agents are indeed the next breakthrough after LLMs.

We started with AI wrappers; mostly people call it chatbot. We feed prompts to these chatbots to act as someone and help users.

But what is missing is control. AI agents will have access to your software that will operate with just text and perform certain tasks on your behalf.

That is happening with tools like Cursor, Bolt, and all coding tools.

So basically, LLMs understand certain things and decide which tools to use. Another simple definition of AI agents.

AI agents' complexity can rise based on the requirements you have. You also need to find a way to give tool access so they can operate it at a certain level.

Is there a possibility that there will be a generic AI?

I think AI agents are very specific to certain tasks. For example, operating a browser, an agent needs to understand websites, buttons, forms, etc.

Do we need to train LLMs so they can operate these websites?

Maybe. Since websites can vary.

AI agents can automate most of the SaaS tools today and can eliminate the need for a complicated dashboard.

Just a chatbot with a simple UI can make things more accessible.

It can make an API call, can update databases, can perform certain tasks easily without going through complicated UIs.

So there’s an opportunity to make things simpler with AI agents. It can be B2B SaaS or anything. Look for a gap!

So the answer to this question is no. There will be no generic agents as of now because each AI agent has to cover edge cases which can be beyond the scope.

A limited scope + LLMs + certain tools to perform actions = AI agent

There is more to explore in the AI agents itself; I haven’t even scratched the surface. There are types of agents, but this is just a basic mental model on how we can think about AI agents.

Beyond AI agents

If you see the high-level picture, I think AI agents are just another wrapper for LLMs which has access to tools along with instructions. So AI is not truly part of the system; it’s just a backend to which you make calls and ask to perform certain tasks.

So what next?

Companies and startups are trying to bridge the gap between AI and existing software by making it AI-native first.

For example, DIA, the browser by Browser Company, claims the first AI-native browser which has access to multiple tabs, can crawl websites, and come up with personalized responses.

Maybe it can also compare prices between two websites, for example, Flipkart vs Amazon.

This can only be done if you think of AI as a first principle when building software.

There might be AI-native software (who knows); Microsoft and Apple are already working on this.

So in summary

LLMs as foundation, can orchestrate your existing system if you give them access.
It can simplify things significantly with minimal UI and minimal input.

I will research more about this and come up with more ideas and thoughts.