Video Thumbnail

What's next for AI agentic workflows ft. Andrew Ng of AI Fund


All of you know andreu in as a famous computer science professor at stanford was really early on in the development of neural networks with gpus of course a creator of corsera and popular courses like deeplearning.ai also the founder and creator and early lead of google brain but one thing i've always wanted to ask you before i hand it over andrew while you're on stage is a question i think would be relevant to the whole audience 10 years ago on problem set number two of cs229 you gave me a b. And i was wondering i looked it over. I was wondering what you saw that i did incorrectly. So anyway andrew thank you hansen looking forward to sharing with all of you what i'm seeing with ai agents which i think is the exciting trend that i think everyone building in ai should pay attention to and then also excited about all the other on sak presentations. So hey agents you know today the way most of us use lish models is like this with a non- agentic workflow where you type a prompt and generates an answer and that's a bit like if you ask a person to write an essay on a topic and i say please sit down to the keyboard and just type the essay from start to finish without ever using backspace and despite how hard thises is l's do it remarkably well in contrast with an agentic workflow this is what it may look like have an ai have an lm say write an essay outline do you need to do any web research if so let's do that then write the first draft and then read your own first draft and think about what parts need revision and then revise your draft and you go on and on and so this workflow is much more iterative where you may have the l do some thinking and then revise this article and then do some more thinking and iterate this through a number of times and what not many people appreciate is this delivers remarkably better results i've actually been really surprised myself working these agent workflows how well how well they work i's do one case study at my team analyzed some data using a coding benchmark called the human eval benchmark released by open a few years ago but this says coding problems like given the nonent list of integers return the sum of all the all elements are an even positions and it turns out the answer is you code snipper like that so today lot of us will use zero shot prompting meaning we tell the ai write the code and have it run on the first spot like who codes like that no human codes like that just type out the code and run it maybe you do i can't do that so it turns out that if you use gpt 3.5 zero shot prompting it gets it 48%.

Right gp4 way better 607 7%. Right. But if you take an agentic workflow and wrap it around gpt 3.5 i say it actually does better than even gbd4 and if you were to wrap this type of workflow around gb4 you know it. It also does very well. And you notice that gbd 3.5 with an agentic workflow actually outperforms gp4 um. And i think this has and this means that this has signant consequences fighting how we all approach building applications so agents is the ter of around a lot there's a lot of consultant reports talk about agents the future of ai. Blah i want to be a bit concrete and share of you the broad design patterns i'm seeing in agents it's a very messy chaotic space tons of research tons of open source there's a lot going on. But i try to categorize bit more concretely what's going on agents reflection is a tool that i think many of us should just use it just works to use. I think it's more widely appreciated but actually works pretty well i think of these as pretty robust technology when i use them i can you know almost always get them to work well planning and multi-agent collaboration i think is more emerging when i use them sometimes my mind is blown for how well they work but at least at this moment in time i don't feel like i can always get them to work rel lively so let me walk through these four design patterns in the few slides and if some of you go back and yourself will ask your engineers to use these i think you get a productivity boost quite quickly so reflection here's an example let's say ask a system please write code for me for a given task then we have a coder agent just an lm that you prompt to write code to say you def du task write a function like that an example of self-reflection would be if you then prompt the lm with something like this here's code intended for a toas and just give it back the exact same code that they just generated and then say check the code carefully for correctness sound efficiency good construction cri just write prompt like that it turns out the same l that you prompted to write the code may be able to spot problems like this bug in line five may fix it by blah. Blah. And if you now take his own feedback and give it to it and reprompt it may come up with a version two of the code that could well work better than the first version not guaranteed but it works you know often enough for this be wor trying for a lot of applications to foreshadow to use if you let it run unit test if it fails a unit test then he why do you fail the unit test have that conversation and be able to figure out fail the unit test so you should try changing something and come up with v3 by the way for those of you that want to learn more about these technologies i'm very excited about them for each of the four sections i have a little recommended reading section at the bottom that you know hopefully gives more references and again just the foreshadow multi-agent systems i've described as a single coder agent that you prompt to have it you know have this conversation with itself one natural evolution of this idea is instead of a single code agent you can have two agents where one is a coder agent and the second is a critic agent and these could be the same base lm model but that you prompt in different ways where you say one your expert coder right code the other one say your expert code review to review this code and this tye of workflow is actually pretty easy to implement i think it's such a very general purpose technology for a lot of workflows this would give you a significant boost in the performance of lms the second design pattern is to use many of where already have seen you know lm based systems using tools on the left is a screenshot from co-pilot on the right is something that i kind of extracted from gp4. But you know lm today if you ask it what's the best coffee maker web search for some problems will generate code and run code um. And it turns out that there are a lot of different tools that many different people are using for analysis for gathering information for taking action for personal productivity it turns out a lot of the early work in two use turned out to be in the computer vision community because before large language models lm's you know they couldn't do anything with images so the only option was that the lm generate a function called that could manipulate an image like generate an image or do object detection or whatever so if you actually look at literature it's been interesting how much of the work in two years seems like it originated from vision because lms would blind to images before you know gp4 and lava and so on so that's two use. And it expands what an lm can do and then planning you know for those of you that have not yet played a lot with planning algorithms. I feel like a lot of people talk about the chat gpt moment where you're wow never seen anything like this i think if not used planning alums many people will have a kind of a ai agent. Wow i couldn't imagine the ai agent doing this i've run live demos where something failed and the ai agent rerouted around the failures i've actually had quite a few of those moment.


Wow you can't believe my ai system just did that autonomously. But one example that i adapted from a hugging gpt paper you know you say this general image where the girls read where a girl is reading a book and it posts the same as a boy in the image example. Jpack and please subscribe the new image for your voice so give an example like this today we have ai agents who can kind of decide first thing i need to do is determine the post of the boy. Um then you know find the right model maybe on hugging face to extract the post then next need to find a post image model to synthesize a picture of a of a girl of as following the instructions then use image to text to and then finally use text of speech and today we actually have agents that i don't want to say they work reliably you know they're kind of finicky they don't always work. But when it works is actually pretty amazing but with agentic loops sometimes you can recover from earlier failures as well. So i find myself already using research agents for some of my work where one of piece of research. But i don't feel like you know googling myself and spend a long time i should send to the research agent come back in a few minutes and see what it's come up with and it sometimes works sometimes doesn't right.


But that's already a part of my personal workflow the final design pattern multi-. Asian collaboration this is one of those funny things but it works much better than you might think but on the left is a screenshot from a paper called chat dev which is completely open which actually open source many of you saw the you know flashy social media announcements of demo of a devon chad dev is open source it runs on my laptop and what chad dev doeses is example of a multi-agent system where you prompt one lm to sometimes act like the ceo of a software engine company sometimes act designer sometime a product manager sometimes i a tester and this flock of agents that you built by prompting an lm to tell them you're now co you're now software engineer they collaborate have an extended conversation so that if you tell it please develop a game develop a goi game they'l actually spend you know a few minutes writing code testing it iterating and then generate a like surprisingly complex programs doesn't always work i've used it sometimes it doesn't work sometimes it's amazing but this technology is really getting better and just one of design pattern it turns out that multi-agent debate where you have different agents you know for example could be have ch gpt and gemini debate each other that actually results in better performance as well so having multiple simulated air agents work together has been a powerful design pattern as well so just to summarize i think these are the these are the patterns of seen. And i think that if we were to use these patterns you know in our work a lot of us can get a prity boost quite quickly and i think that agentic reasoning design patterns are going to be important this is my small slide i expect that the set of t ai could do will expand dramatically this year because of agentic workflows and one thing that it's actually difficult people to get used to is when we prompt an lm we want to response right away in fact a decade ago when i was you know having discussions around at google on it called a big box search we type a long prompt one of the reasons you know i failed to push successfully for that was because when you do a web search you one of responds back in half a second right that's just human nature we like that instant grab instant feedback but for a lot of the agent workflows i think we'l need to learn to dedicate the toss and ai agent and patiently wait minutes maybe even hours to for a response but just like i've seen a lot of novice managers delegate something to someone and then check in 5 minutes later right. And that's not productive i think we need to it be difficult we need to do that with some of our ai agents as well i saw i heard some loss and then one other important trend fast token generation is important because with these agented workflows we're iterating over and over so the lm is generating tokens for the elm to read so be able to generate tokens way faster than any human to read is fantastic and i think that generating more tokens really quickly from even a slightly lower quality lm might give good results compared to slower tokens from a better lm. Maybe it's a little bit controversial because it may let you go around this loop a lot more times kind of like the results i showed with gbd3 and an agent architecture on the first slide and. Cand. I'm really looking forward to cloud 5 and cl 4 and gb5 and gemini 2.0 and all these other wonderful models that may are building and part of me feels like if you're looking forward to running your thing on gp5 zero shot you know you mayble to get closer to that level performance on some applications than you might think with agenting reasoning but on an early model i think. I think this is an important trend and honestly the path to agi feels like a journey rather than a destination. But i think this typ of agent workflows could help us take a small step forward on this very long journey thank.


👇 Give it a try