This week OpenAI released ChatGPT, its prototype AI chatbot. On this episode of Intercom on Product, Intercom Co-Founder and Chief Strategy Officer Des Traynor sits down to chat with our Director of Machine Learning, Fergal Reid for reaction and analysis on the implications of ChatGPT and what the future holds.
Chapters
0:00 What is ChatGPT?
4:41 The internet reacts
6:10 Is GPT producing creative work?
7:45 GPT and Facts
9:28 Effect on peoples jobs
13:31 Hallucinations
21:26 How to improve language models
26:01 Is this bigger than Google?
29:30 Customer Support
42:06 Smart Replies
45:28 What other industries will be effected?
50:13 Outro
Transcript:
Follow the people:
Follow the podcast:
Referenced in the episode:
[Music] Hey Frugal hi Dez how’s it going thanks For having me back uh it’s good to have You back we had you only like five weeks Ago on the podcast to talk about stuff That was happening with AI and you’re Back again because it’s a busy five Weeks you’re busy five we’re kind of Busy seven days uh seven days ago was Wednesday the 30th of November and I got An email with an open uh invite to an Open beta uh for a thing called chat GPT What happened What happened so um it’s an interesting Question open AI released um their most Recent I guess machine Learning System AI system and they released it very Publicly and it was it was chat GPT and You know it it’s pretty similar to their Current offering Um GPT 3 gbt 3.5 but it was packaged Differently you didn’t need to put a Credit card into it and so I think Everyone just saw that word has been a Huge change in capability here recently And it went viral it went wild and Everyone got really excited and around The same time they released their most Recent GPT 3.5 model and like DaVinci’s 803 and which you know does a lot of the Same things maybe it’s got like slightly Different Um it’s maybe slightly less good at Saying hey I’m a large language model
And I can’t do that for you but it’s uh Similar in terms of capability let’s do Some quick definitions to ground Everyone open AI is obviously the Institution lot of work on AI and ml Yeah You said GPT what’s that sound for is it A thing that stands for something I Actually don’t remember general purpose Transformer or something like that but Does that name mean anything or yeah so Basically I think the key piece is Transformer so you know there was there Was um I guess a recent breakthrough so For a long time people were trying to uh To figure out hey what’s the best way to Like train neural networks that deal With text and you know natural language Processing tasks and it was a long time There was these like lstms that kind of Combined short-term structure of your Text with like long-term structure of Your sentence and you know sequence Models and everyone was working on those And then um Google published this paper You know attention is all you need and a Pretty revolutionary paper with a pretty Big thesis which was like hey Um instead of like these sort of Traditional sequence models here’s a new Way of doing as a new model which they Call the Transformer model or the Transformer architecture and and um it’s Kind of all sort of about like hey when
You’re looking at a specific world the Model will learn sort of like the other Parts of the sentence that you should Also look at in conjunction with that Word and you can just learn things a Little bit more efficiently Um than with sequence models and you can Train it faster more efficiently you can Scale it further and so you know Transformers everyone started to use Transformers for all sorts of sequence Data and and then I think open AI one Thing that they really contributed to Was this idea that like hey you can take These Transformer architectures and you Can really push up the scale you can add Way more training data way more compute To them and perhaps very surprisingly And I I really think this is this is the Key thing that as you push more and more Training data they seem to exhibit Qualitative changes in terms of what They can do so it’s kind of like you Know hey this seems to kind to Understand it or you know I can say make This happier make this sadder which is Like a very abstract concept where did I Learned that we didn’t like give it this Supervised learning where it’s like hey That’s let’s code in a definition of Sadness or happiness it just started to Learn these abstract Concepts and these Abstractions from masses of training Data and you know I think basically you
Know open Ai and some others have just Been pushing that scaling piece more and More and more and um and there’s other Things as well GPT 3.5 they train a Little bit differently to try and align It more but basically the big thing here Is that lots of scale lots of training Data actually kind of simple models you Can do remarkable things things that 20 Years ago people would have been like Well a computer will never do this it’ll Never be able to write me a sonnet you Know it’s like what sort of sauna would You like and make the sound unhappier And so yeah it’s a remarkable time Because a lot of things that we thought Were you know the domain only of human Intelligence oh you just need tons of Training data and a big model yeah so Like and then what you know what happens Since last Wednesday was like I guess Twitter and then maybe like seven days Later like the general internet or like The media Etc caught on to this and I’ve Seen all sorts of like frankly like Outstanding uses no outstanding in terms Of like I just could not imagine this is Possible so like I saw You know write me instructions for Copying a DVD in the style of a Taylor Swift song where she’s angry because she Broke up with her boyfriend or something Like that right yeah and uh But it actually has a go on it
Absolutely and then I’ve seen other like Other like um like how do you install Intercom on the IOS app uh actually got Stuck relatively correct too yeah and And kind of everything in between and Then the other like crazy thing I’ve Seen is like for any of these things you Can double back it and say no give me That in the style of like a 1940s Gangster I never say it in German I know And it translates to German into Spanish But also add in like more you know more Anger or whatever and it does all these Things like both immediately like as in You know pretty much zero second delay Yeah and uh and like you in all cases You can kind of see what it’s going for And A lot of like where I you know one Personal example I use is like uh you Know trying to tell your child a story Before bedtime you can kind of run out Of angles there’s only so many different Ways that like you know like whatever Like tree dragons could go into a forest And get lost right however gpt3 is Actually great for uh for giving me like 10 more stories or whatever yeah I think What I’ve noticed is like for the Longest time the story of AI Even a story like you know as recently As like years ago people would have been Saying it’s like it’s great for specific Stuff but there’s no way it can tackle
Creativity yeah and um is it fair to say It feels like we’re actually in the Inverse World here yeah I mean does this Thing people are are talking about which Is um you know when people were talking About AI themselves well the first Things that are going to do is those Wrote manual tasks and then like humans We’re gonna like have all this time to Go and do these like highly creative Things we’ll go into the forest All the time beautiful poetry and then It’s like oh wow well those manual tasks Require really hard like vision and like Processing time things to solve but Creativity where there’s like no wrong Answer and you can’t you don’t there’s No penalty for getting it wrong yeah Like it okay diploma isn’t quite perfect But it’s okay or like that the rendered Um Dali 2 and mid-journey whatever image Is that it might not be exactly what you Had in mind but it’s still beautiful Image you can choose one from tan and You can and you can see what it’s going For as well I think that’s one thing People don’t realize is um Like it’s giving you back what was Probably in your head because because You’re going to see it anyway like you Know when you when I say like you know Give me instructions to open bank open a Bank account in the start of Rage Against the Machine song like I see like
Yeah we’re gonna fight to open the Account and we’re going to rage all Night or whatever and I’m like oh yeah I Can see what they’re doing but like I’m Not even applying an accuracy scale There I’m just like ah you had to go you Know like and you’re giving a credit for That yeah I I think that’s probably true I mean it’s like to what extent are we Good at judging you know near misses in Terms of non-factual information yeah Maybe we’re just not that good at it Maybe we don’t care deeply about it and And we can get into this and we’re going To have to get into this this issue of Of factfulness or factualness but um but Like you know even when you ask it a Factual question let’s say you ask a Customer support question and you know I Asked one recently about like two-factor Authentication how do you reset your Intercom to factor authentication and The answer I got is like wow that’s a Great answer yeah and I look at it like Hang on that’s not how you remember 2fa And it’s a beautiful URL it’s got the Reference to our help center article Yeah that’s been made up too you know Yeah even things that are factual like I I think you know it’s like people talk About like humans and human brains and Like is this model of like you know System one and system two we have this Intuitive part it’s like really good at
Like recognizing patterns and then we Have the kind of logical analytical Reasoning part that’s slower and more Precise yeah and like this thing seems Like it’s it’s very good at that sort of Intuitive piece and it’s very good at Fooling our intuitive piece so when you Look at it on a glance it looks correct Until you like really apply your slower Systemic reasoning it can be hard to see That and so I I think that that sort of Intuitive piece is speculating is is Probably what we rely on more to judge Creative Endeavors art and pictures Sonnets at least initially and so it’s Very good at generating things that are Plausible on on first glance and but Then maybe yeah when you actually take Time to think about it you can’t see the Problem and I think that plasma first Glancing is really important because I Think that’s like uh most of the people Like ourselves included are having their Minds blown are having a blown by the Idea of like plausible or first glance Right uh as in like yeah you can you Know you’re giving a lot of credit for That despite the fact that that often Might not have a lot of real world Applicability like you know like it’s Not like you’re never gonna hang that Painting in the museum where you’re Never going to actually read that you Know uh like whatever started out or
You’re never going to win an award for That novel or whatever Um and I think that um The like the obvious direct application Here like I I see a lot of folks like Say content marketers are saying things Like this is going to change my my job Forever and I’m like yes but maybe not In the way to if you think your job is Going to be like simply like typing in Prompts and hitting tab uh it’s possible That you know your job might not exist And similarly I see like you know Managers on Twitter saying oh this will Make performance review season so much Easier right and I’m in all these cases There’s something wrong with that like Interview season is there something to Rob the performance exactly I’m kind of Sitting on you you’re all saying the Quiet bit out loud here like if you’re If if your job actually involves you Riding a lot of like spurious you know BS that could be why are you doing in The first one what are you doing yeah Exactly and like and I I get that like You know in the case to say content Marketing there might be like reasons Why like you just need to like rank for Certain words whatever but like um Don’t mistake that would like actually The the the the craft of actually Writing or whatever you know I mean it’s Possible this is a good thing it’s
Possible that like you know Jobs like things that like okay the Person doing it feels has got no value And so like this performance review is Just work work so I’m gonna pan that off To GPT and then like you know after a While everyone kind of realizes that’s What’s happening and pressing the other Side well I’m gonna hand it off the GPT To analyze it and like maybe then we can Have an honest conversation about like Hey what’s the kernel that’s actually Really valuable yeah and like how do we Eliminate the work work yeah why are we Doing all this other like performative like yeah I mean it’s possible That the the really big contribution That this Tech makes to humanity is an Honest conversation about like the Amount of work work that elimination Yeah that could be great that could be Like massively transformed potential That is uh if we talk about like actual Applications like something that’s on my Mind is like um like It’s not going to at least my experience Of it directly and even what you said About say the 2fa use case it’s not Going to directly um you can’t deploy it Directly today in a lot of areas where There’s a definitive right answer Especially if the risk of giving the the Wrong answer is pretty high so like you Don’t want this thing consuming medical
Records and spitting at diagnoses Because I can guarantee you the Diagnosis will be really well written Really believable sounding to a lay Person and uh and you know possibly like A low probability of accuracy now we Don’t know the probability of accuracy But it’ll vary based on the inputs I’m Sure Um It would certainly scare me a lot if Someone came to me and said hey fragile We want your team to start you know Using this for medical diagnosis yeah That’d be great that would be extremely Scary um but there are other like maybe Like less uh you know grave but like Equally like um equally inaccurate use Cases where you know use it to diagnose A conclusion in the legal case like I’m Sure it would again it would sound good Would wrap it in all the right like uh Boilerplate language or whatever but Like it would still ultimately not Really know what a thing or whatever and Uh and I see the same like I’ve asked Did like give me ideas on how to build a Modern email client to compete and win In the in the productivity space and it Gives you it sounds it reads really fine But it’s only when you scratch and sniff And if you’re like there’s actually Nothing here like it’s just nice Sounding word after a nice sounding word
Without particularly sharp opinions Um that to me like makes me wonder like What uh what are the Um ways we could make this more Applicable like so before we get into That like I I think there’s two things I Think it’s helpful to tease out here one Thing is that like this Tech absolutely Has problems with hallucinations is what A lot of folks call this right where it Doesn’t know something it just makes it Up yeah and like that’s pernicious and There’s a lot of domains where you know A one percent probably of hallucination Like is a deal breaker for the domain Yeah and so like that that’s true and You know we would all love if if that Probability was Zero But at the same time Also the accuracy has gone up like you Know this versus where state of the art Was a year ago versus where it was three Years ago like it’s absolutely better at Giving you the right answer a lot of the Time yeah too yeah and you know Um It’s dramatically better at you know Quote unquote understanding right it’s Dramatically better like I I I now I I Struggle to say oh it’s just doing Pattern recognition it doesn’t Understand anything or at least I Struggle to say that without like but What do you mean by understanding yeah
Yeah so so you know we’re definitely on A trajectory here where while it will Still make things up and that’s a big Problem uh it’s getting better and Better at also giving you the right Answer when it has the right answer and So you know what does that curve look Like Um like and it’s it’s difficult to Unpack at the moment but we’re getting Dramatically better models that are are Much better at like doing the right Thing while they still also sometimes do The catastrophically wrong thing and so We should pay attention to both of those Things we should be like yeah this is Very difficult to deploy in a lot of Production settings at the moment at Least nakedly at least without some Clouding around it or some affordances Around it but also it’s getting much Better if you ask it something that’s You know really well covered on Wikipedia yeah that you know ultimate Example of this is is computer Programming right you can ask at an out Of sample programming challenge a Programming challenge it hasn’t seen and Like you know if you ask it to generate You like a whole module or a whole System yeah it kind of struggles you set It off breaking point but if you ask it To write you like a function even if You’d like and you made up out of C add
A sample one like it might give you Totally the wrong answer yeah but the Chances of it giving you something Useful have like gone way up in a way That I think you were saying before it Basically passes our programming or like Our first step stage in our programming Interviews is like is like you know Write some sort of array-based question Right yeah it just Nails it like yeah so So exactly so I mean and I I haven’t Already searched but as far as I know we Have we’ve you know problem solving Programming challenge for engineers Coming in intercom I had to sit them Myself a few years ago and um and you Know we try very hard to make sure That’s like not available on the Internet and if it’s not a hacker rank Or something yeah and if it is we we try And like iterate and change and we don’t Do that we’re not like way up to speed So I I can’t guarantee it it isn’t that There but this thing generated a Solution that just like nailed it and You know that is a senior engineer at The Whiteboard for half an hour sort of Problem and it just gets at like one Shot wouldn’t go zero seconds zero Seconds and that’s very impressive and I I also you know I like have the rest of The world I’ve been playing with with Chat 2T or GPD 3.5 and like given lots Of other programming competition
Questions or program questions which I’m Pretty sure like out of sample and it Does a very good job but you know the World knows this like people are using Co-pilots people on my team who use Co-pilot data today yeah but you know That so that that is a that’s a Qualitative change in accuracy you got To check your code you got to make sure It’s not wrong Um but that’s very interesting very Exciting very exciting as well it’s the Idea of you know it’s got like at least Rudimentary in introspection Capabilities if it writes a book you can Be like hey there’s a bug you know can You fix it and sometimes it’s like gives You a beautiful explanation of it And like all these models are trained to Do is like token token prediction you Know predict the next few words if you Do a good job predicting the next few Words or at least traditionally you know I guess it’s changed a little bit in the Last year but the guts of it the bulk of The training is just predict the next Token predict the next word And there is something amazing happening Here which is that just doing that at Scale You get to some level of understanding And it’s a bit like you know we I don’t Want that to get lost in the in the Wider discussion about hallucination
Which is real and people didn’t pay Enough attention to maybe last week but Yeah we also does this metaphor I don’t Remember who who kind of came up with it But it’s a metaphor out there on the Internet of like you know there’s a Talking dog and someone tells you they Want you to go meet their new talking Dog and you’re like dogs can’t talk yeah You get to the dog and the dog has a Conversation with you and everyone Starts talking about how the dog’s Grammar isn’t very good very important Yeah don’t lose fat out of the fact the Dog is talking it’s grammar the Hallucinations thing for me is is like That this is uh this this feels like a Big change maybe not when we can put in Production and who knows where it’ll be In a year two years three years so yeah But like yeah the hallucination thing For me doesn’t like render it uh useless At all it all it does is maybe changes The minor in which like I put it this Way Um If we if we assume like let’s be like Pessimistic and say like giving it like A five paragraph description of a Patient it can give you a 70 accurate Diagnosis uh immediately Um And in most of those diagnoses questions There’s some quick test that can then
Verify whether or not that’s true like As in sounds like you have blah here’s The quick test for blah turns out as Rock turns out I was wrong that’s still A massive like productivity step change Uh if we put it this way if we assume The thing is still flawed but but to try To take the benefit of the 70 accuracy This this possibly still things you can Do it’ll be massively valuable right Like they’re they’re Pro there probably Aren’t there probably aren’t a lot of Domains the the medical thing Um obviously is this yeah yeah I’m not Trying to replicate for like forever Like yeah I I I I I I have two thoughts On that um the first thought is like Someone would need to study that because It is possible that hey this thing is Not negative yeah like you know the the New system with the human in the loop With the doctor and the artificial Intelligence has a higher probability of A catastrophic error because the tired Overworked doctor sometimes doesn’t do Their diligence when there’s an Appealing but incorrect system in front Of them look this is like the Self-driving current thing right yeah It’s like you know you’ve got to be Ready to take over at any point but There may be areas in that regime where Like the system as a whole with the Human is actually worse than just people
Actually over trust people can over Trust and like you know what do they Call it like these not desensitization Like normalization of deviance like yeah You know people study this in the Context of like nuclear reactor Disasters and stuff like yeah what went Wrong oh well we got used to this Shortcut I’m sure if it wasn’t always Valid exactly yeah so that’s one thing I Would say but then Counterpoint weighted Against that is you know when we’re ever Thinking about like things like medical Things you know some portion of the World doesn’t have access to a doctor Yeah right I’m like maybe there’s some Future thing where you know is is like It’s the classic thing it’s like looking Up medical things in the Internet it’s Like well if you can’t afford or you’re In a country doesn’t have a doctor so so I don’t know where where to draw that Where to draw that boundary that’s a Hard boundary to draw you know like Eventually on the trajectory this stuff Will get you know probably get good and Good better and better enough that Eventually you know the system as a Whole does outperform whatever people Currently have and yeah I don’t know the Other thing you’re saying um just go Back to the programming example for a Sec because we’ll actually have a real Version of the matter of the
Conversation when we talk about customer Support a little bit like where it’s Like hey here’s an example where error Tolerance is acceptable and there’s Certain conditions Etc Um you were saying like with like with The when it generates code you can say Hey that’s Boogie or another example I Saw that was popular on Twitter for a While I was like talk me through your Thinking line by line or whatever like It’s almost like you’re giving it um You’re telling it how to think about Things or you’re giving it new Information and then forcing it to like Reconsider its opinion yeah what’s Happening there like so I I think There’s something fascinating happening There and like we got to talk right at The Cutting Edge here and so this is Speculating and I’m a spectator I’m not Like doing this work or whatever but uh So I think Google published a paper Um pretty recently about like how large Language models can self-improve yeah so I just I think there’s something Fascinating there that’s worked on Packing so the first thing is that maybe About a year ago something like that People discovered that these models uh While they would get things wrong a lot You could prompt them with and the Classic one is let’s think step by step So you had a model you could ask it like
A simple math question like you know Alice and Bob have got like you know Tree chocolate bars and they give three Away to Eve or something like that money They have left and you know these things Struggle with with basic maths and and So like it would often like get things Like that wrong but you could say Something like let’s think step by step That kind of like forced it to Output Its reasoning step by step away like Along the way and then accuracy rates Went up when you did that Um which maybe kind of makes sense like It’s trained to you know complete text And so like step by step well each step You know is designed it’s almost like You’re not multiplying out the uh the Probability of failure if you’re running At each step like 90 correct yeah five Steps is all of a sudden like only 50 Correct perhaps possibly something like That I mean it’s always difficult to Speculate on exactly what’s going on in The internals of these but possibly Something like that But you know there was a very Interesting paper recently where it was Like hey we know that we can improve the Accuracy by saying let’s think step by Step and what we can do is we can get Like use that to get like better outputs Than just have it like you know I would Say intuitively instantly give the
Answer yeah you can use that to build a New training data session yeah and like Retrain the model to like improve its Accuracy so like that for me is Fascinating because that’s like hey These things can self-improve to at Least some degree I saw a demo recently Um you know as I’m out on an open AI to Get did a demo on a Microsoft Event Recently where he kind of showed Um you know I guess co-pilot or one of Those models and maybe DaVinci like code DaVinci or something I don’t know Um they didn’t specify but they showed Us doing something like this with a Python prompt right Um where it was like you know they gave It a natural language and problem a bit Like our intercom programming problem And then you know ask the system to Synthesize code code but then put the Code into a python prompt and when it Got it wrong system saw all of that so Try to execute the code it’s all it was Wrong and then took another go at Another go until it got it right right And like could you combine that with the Self-improvement thing you know I think This is a very interesting World here Where you know Um language models in NLP is starting to Look a bit more like the alphago world Where and self-play and improved and so Yeah I think it’s uh it’s very exciting
Times and we don’t really it’s very hard To say what the limits are here and yeah I think there’s a lot of things that People for a long time would have said In you know uh Linguistics or something Like that where they’re like well in AI Will never be able to answer these final Grad schemas or something like that These you know like the the the the the The the tractor went down the road and Turned into a field right like please Explain what happened in that field you Know yeah and computer is what we’re bad At that historically you know and uh the Magic tractor went down there Road and Turned into a field you know it’s like Modifier like that changes that changes The meaning it’s getting really good at That in some domains and you can ask it Like basic semantic questions or ask it To speculate and it’s like It’s very hard to say oh like you know Up until about two three years ago and Whenever I saw a new machine Learning System whenever we like really got into It like it always looked magical and Amazing at the start whenever you got Into it you got underneath the hood You’re like oh it’s just logistic Regression or like now I understand that It’s much less impressive and um it’s Doing something extremely basic and I’m Struggling to do that here maybe that’s Just because it’s so hard to understand
But Um the complexity of the model but I Think I think you know these things are I feel like qualitatively different Capabilities than we’ve had before we Get into support which we’ll Deep dive On I’ve seen comments saying uh This is as big a moment for the internet As Google I’ve also seen like Um you know like this sort of like I Would say the cold water take which is Like don’t be filled generating random Song lyrics is a is a gimmick at best or Whatever right I’m like yeah there’s Obviously a spectrum of appetite and Depending on whether or not you’re like I guess a techno positivist or whatever Right like that Um what’s your take on the Google thing Like is this a like potentially as big As Google is this b a threat to Google C Who talks on how Google might react yeah So super speculatively here kind of Entering into total futurism and stuff And I feel that I I’m a very bullish on AI And machine learning whatever we want to Call it I I feel that like the change And capability that we’ve seen over the Last kind of like year and certainly if You extrapolate forward another year or Two I personally feel that’s as big as
The internet I think that’s gonna like The potential and we’re gonna have to Figure out how to productize these Things it’s gonna have to be a ton of Work done how do you constrain them to Answer for a knowledge base and so on But I I just think that the the sum Total of new capability that we’ve Gotten and that we are likely to get Feels to me as as big as the internet And I might be wrong but that that’s Where I would bet that’s the order of Magnitude it’s a bigger than Google yeah I I I I think so now you know not not Just like you know Just came out last year but you know the Total progress here it just feels like You know we’re seeing like dramatically Better capabilities at reasoning Elementary reasoning and reasoning that Can be wrong but but sometimes quite Compelling like I would not have Believed if you told me if it’s success On programming you know challenges like Five years ago I think most people Wouldn’t have believed that 10 years ago Um so yeah so I I think there’s Something big here there’s a lot of Productivity that can be unlocked Um and uh yeah and it’s very hard to say Where that’s going to stop and I also I Think like to just feedback loops here I Think I I feel this is like a Sputnik Moment and in chat GPT you can say like
Hey the tech isn’t that much better or Like people it’s getting overblown But I don’t underestimate the ability of Low friction being able to like go in Play with something everyone can do that And I think it’s like a Sputnik moment Which is like people who kind of look at This and go wow something’s arriving Here I can say like policy reference Here sorry spot Nick um this is um you Know the uh oh my God back in the 50s I Don’t remember when um but you know it Was you know the Russians pushed this uh Satellite in space that like orbited the Earth and like broadcast a radio so People all around the world could Suddenly like tune in their radio and Get like the signal yeah coming from Sputnik and like you know this is the Narrative that’s generally told like in The west people like kind of suddenly Woke up and they were like wow there’s a Capability change here that we’re not Aware of and then supposedly this caused The you know the Space Race Okay so I I I kind of feel that like you Know That maybe the reaction is still playing Out but I just see so many people who Were not really paying attention to this Are suddenly excited about maybe that Hype will die down we’re in the middle Of it so it’s difficult to predict yeah But um but you know if this won’t be it
Then something else will be soon I think That’s all well and good what about Customer support obviously intercom is a Customer support platform and the Potential that GPT chat or GPT 3.5 or Any of these Technologies can uh either Make support better or faster or cheaper Or more successful or more end to end is Something that we’re always all over I Know you’ve been thinking about this From a support point of view yeah Earlier we talked about how um You know like there are there are Environments where like an incorrect Answer is like you know a very very very Bad in their environments where it’s Like actually quite tolerable you know Yeah Um We have like you know 25 000 customers Across all you know some of like Banks Probably can’t afford one other people Would happily afford one if because it Means they can support all their Customers faster or whatever yeah um how Do you think about this technology as it Applies to support based on what you’ve Been seeing and looking at without Giving away anything super sensitive Yeah Um so you know we’re obviously we we try And pay a lot of attention to change the Developments in this space and we you Know we got we got we were looking at um
GPT tree like pretty early Um and you know our initial thoughts Were Um hey like the accuracy is not quite There yet and the hallucination problem Is is a big problem to just you know Nakedly like say hey it’s it’s consumed The intercom help center that’s asking Questions about resetting my two-factor Authentication you just failed just Failed Um and we’ve been looking at uh kind of The GPT 3.5 family of models and some Other models as well recently and you Know it initially uh you know so we we Have resolution boss we have it in Production and it’s you know it’s not Using language models that are as large As maybe medium language models Um you know embeddings and so on and you Know it it gets very good accuracy at The sort of thing it does and it’s Designed to you know the customer has to Go and like create a specific answer it Will never we made a conscious design Decision very early on yeah that it Would never say anything that hasn’t Been like explicitly curated by by the Team and um I think that works well for A lot of businesses because hey it might Deliver it uh the wrong answer sometimes We try carefully control that but it’s Always going to deliver you like a Relevant answer and or like and answer
That’s that’s not not going to mislead You yeah or specifically like I think It’s the way in which it gets wrong is It might give you a wrong correct answer If you know yeah the thing it gives you Will be something that somebody in your Company has said this is a correct Cohesive piece of text exactly City to Write one for the question right like And we all we always like we we have Like we encourage our customers to uh to Always kind of like write the the answer In such a way as like oh to like to Reset your account do the following Things instead of is delivered wrongly At least the engine is not disoriented Yeah they can make that they’re going to Do it for no reason they can go like oh It’s a stupid bar it gave me the wrong Answer as opposed to like I am misled And I’m not going to like waste a bunch Of time doing something interesting it’s Not bad so like so initially Um you know GPT tree we were like we Were like oh it’s just it’s it’s really Cool but difficult to see that that kind Of end-to-end usage of this and and I Think that that’s boring out which is That you know that’s been a couple of Years and you know I’m not aware of Anyone who has deployed gpt3 in a total Like end-to-end way to like answer the Question end-to-end meaning like there’s No agent in the mix right so little
Agent in the mix and that it’s not like Wrapped in something that like carefully Constrains it like because the risk There is like the they’ll be like kind Of almost like a dark matter like an Unknown unknown so if someone goes to Your business and asks the question that You didn’t see because GPT dealt with it Gave it the wrong answer and the Customer goes off and does the wrong Thing no one actually knows what’s Happened there if you know what I mean Except for the bot and the bot doesn’t Even know it’s wrong because it doesn’t Know if it’s boofing or not yeah so you Kind of end up in this potentially a Potentially dangerous world right Exactly and so we we’ve like quite Carefully designed resolution but to Kind of like to avoid getting into any Of those situations and we calibrate it We checked that you know when it says Something helped the customer it it did Help the customer and with ways of like Jacking that between explicit and Implicit customer feedback so you know But but it’s conservatively designed and You know at some point the technology These sort of like open domain question Answering things or something you could Build on top of GPD 3.5 like at some Point that will get good enough that for A certain portion of our customers that Equation changes right where it’s like
Hey I’m not I’m not answering medically Critical things and like the inaccuracy Rate has fallen you know it was 90 Percent accurate now it’s 99 accurate Now it’s 99.9 you know that how how Commonly it gives you the wrong answer Will eventually fall below the critical Threshold where it’s like hey just be Able to like take this out of the box It it it’s just worth it I don’t have to Go and curate these answers so that will Probably come uh when will that come and Is it here today has it come in the last Like few weeks with like you know DaVinci zero tree and and and chat GPT And that’s obviously something we’ve Been assessing like a lot of other People Um and uh certainly you know work in Progress because you always have to go And like play with the prompts like you Know you can you have this you know when You when you interface with chat GPT or Gpt3 you you ask it a question and we Could take like an end user’s question And then wrap it in something that says Hey you’re a very conservative customer Support agent if you don’t know Something and you’re not completely sure You always say I don’t know a good Reason step by step you’re super Conservative like maybe maybe we can Like wrap it to get the benefit of you Know the the the deeper natural language
Understanding which these models have And the deeper ability to synthesize and Rewrite texts which can be beautiful and Can be really nice maybe we can get Those benefits and then constrain the Hallucinations and the errors enough is That another version of like of like Walking through this line by line yeah Yeah it’s that whole field is that what People call it prop tension prompt Engineering like they’re starting to be People like we’re joking and machine Learning team and intercom is going to Prompt engineering team and we’re kind Of joking about that as we play with it And but there are people out there who You know at some point that laughter Stops I’m sure yeah yeah yeah Yeah Who you know who who really swept the Prompts and gotten really good at Prompted engineering so it’s a real Thing and so you know it’s like it makes It difficult to say like oh this this New tact is definitely not good enough Because what will the best prompts be Like in six months but that said and we Don’t we don’t think it’s here yet and All the prompt engineering we’ve done on Yeah like DaVinci in the last week or Something it’s like you you can you can Guess you can get it to be more Conservative yeah but Um not not not enough like the
Probability of giving a wrong answer and Just totally making stuff up is too high At least to use it in a for end users in A in kind of like a naked way to ask What about like the we talked to her About the doctor augmentation question Like is there a version of it where you Can uh do it from like the Asian Augmentation well so so I mean we um as You know and we uh we believe Um you know an intercom we’ve been Thinking about this area very deeply and For an extended period And in the last You know few months and we have had like Internal discussions about the future of The customer support inbox and about Generative models and I guess you know Jet models that like generate stuff and As opposed to just classify things and Um you know we believe that their time Is coming for support augmentation and I Think that seeing Jackie Boutique Explode Um recently and all the excitement about It is evidence for now there’s evidence Not like yeah these things are getting Good and there’s a lot of things you can Do Um in the inbox or in a context like the Inbox to constrain just kind of sound Off the rougher edges of these things Um so for example an example might be to Like curate the responses that it’s Allowed to to give and then use the
Generative model to like predict what Should happen but then then only Actually allow the suggestion thing to Sell to the teammates like a macro or or Like a conversation response hopefully Provide like a beautiful interface for The teammate to make it easy for them to Do that and alternatively Um to you know have it Go and like Search in your knowledge base and They’re just there’s this techniques you Can use to try and apply it to that and Constrain it to that and then maybe Maybe show like you know this is the the Answer that this thing we wrote from Your knowledge base and side by side With that here is the the original Source article so that the customer Support trap can like look at them side By side and see if it goes up yeah and See if it adds up so there’s an angle Like where the the AI explains like it’s Epistemological basis for how it Concludes this yeah and in that world You’re actually you don’t even you don’t Even need like if you’re to support rep You don’t even need to know If it’s actually right you just need to Know if the logic Stacks open and it Says obviously it’d be better if you Knew it was right as well but like if it Says hey I read how to reset a 2fa Article linked here yeah I’m from I Suggested this is how you reset 2fa
You’re probably like that’s the right Article to read You might need to go like you know the Problem is that when they get it wrong They’re so good at seeming right but Like you’re not gonna invest the idea That there was an article yeah yeah Totally and so so you might need to go Beyond that you might need to have like You know uh the untrusted part of the Interface which is maybe like the Composer and it pre-filled something in Your composer and there’s also like a Trusted part of the interface beside That yeah maybe just above it or Something like that that shows the Original Source article shows around the Paragraph yeah and so you can lock up Both and then the thesis says that like Hey that kind of quick check is better Than like you know because we you know We obviously we study customer support Flow very carefully and very closely and We absolutely have some support agents Where it’s like okay got the question They have to go and find an article Themselves some expert ones they know it They’re instantly there and you know Exactly where to go maybe they’ve got a Macro that that does it yeah but then Maybe someone is like newer in the Company and they’re still being trained In or maybe it’s only part of their job And they have to go and they’ve got to
Find the article themselves and then They gotta read it and they got to check The answer and then they gotta copy Paste it and then they gotta reformat it And maybe so maybe it is like maybe There’s like a you know a multiple Productivity like boost and we can make Someone twice as efficient or something Yeah but like automating that whole Floor but all that is your behavior all That Asian Behavior would also inform The system right like so like yeah if You put it live and like agents are Forever gone wrong right wrong right all That feeds back in and then it gets Better or if they’re rewriting the Answer to be like more Awkward I assume Like yeah you know we can learn from That and then and then like very quickly The system converges on all the right Answers yeah so you know we uh we Certainly could build a system that does All of those things like GPT 3.5 won’t Natively do it if you decide to build on It as a building block and look there’s Even an assessment is that the right System to build on familiar its Capability is very good but it’s not the Only generous model in town Um but whatever we build on and we’re Getting really into the road map Whatever whatever we build on Um potentially we would build like a Learning Loop yeah most of our Tech at
The moment where we do that we Absolutely gather feedback there are Some parts of like resolution but like You know predictive answers where it Predicts things to end users where you Know it actually does use the like the User saying don’t help exactly wait for The team as training signal and and like The feedback and stuff potentially when I end up building that yeah and you know There’s a lot of trade like it’s very Easy to say oh we want a system that Will like learn in production but then It’s like okay who has to maintain that Who has to be bugged out sometimes it’s Easier and to to get it to like a stable Stage and then like lock it where it Just you know so it depends that’s Something that the team here and we like We did metrics and analytics and Whenever we upgrade or yeah we’re Getting into the details of our models And how we check the accuracy and Calibrate them and stuff yeah what about Um Like I know currently our inbox has this Feature where like based on what you’ve Said before like if I jump into inbox It’ll say it but before I’ve said Anything to try and start a conversation I’ll say like hey I’m Dez co-founder Intercom uh thrilled to be chatting with You or whatever whatever my most common Thing is that’s automatically
Pre-written for me Um yeah smart replies yes That like You know that is just the Mini version In some sense of what we’re describing Here like because we were really just Going for like salutations and maybe Ends and maybe handoffs and just to kind Of the the common border plate of a Support conversation should be there for You and that alone is a productivity Boost yeah um but the idea that we just Get one degree more sharp and be like You know somewhere in the middle of all That border players here’s the meat of The answer is like where what you’re Talking about going right yeah so Totally and you know and again there’s You know to separate things after it’s Just the change in the world with like The you know an increased capability uh Gbt 3.5 and and then there’s the stuff That like hey we’re working on as we Grind away on this problem and try and Really deliver things that will make Things better for our customers and you Know I think the capabilities have have Really improved and then we’re still Figuring out can we use this is there a Shortcut to where we want to go yeah via This and you know and maybe we can use These capabilities even as building Blocks you know like um hey this you Know loads of ways of potentially using
Them as building blocks but in terms of The you know the direction we were going On already anyway and yeah there’s a lot Of things agents do like greetings where It’s very obvious like we don’t ever Want to annoy people we don’t ever want To like have an agent you know read Through a bunch of texts and then be Like Why don’t you do that did they go blind And it reduces their trust in the system It just slows them down we want to like Help them out Um So yeah so smart replies we started out With like greetings it was just such an Obvious thing to do we can very easily Tell when you’re probably going to want To greeting uh you’re coming to your Conversations new conversation no one Said anything to the end user before It’s very obvious and I wasn’t Low-hanging piece of fruit people really Liked the user interface like you just Press tab yeah boom it’s done and it’s Easy it’s low friction and now we only Get to make a single suggestion there And then sometimes when you know it’s Just hard for the system to tell Um you know maybe maybe there’s three Different things and you know people at The moment like with this macro flow They use macros a lot they’ve got to Choose which of the macros yeah so we’re
Looking at stuff like should we be Suggesting those micros people Proactively maybe it’s not enough to Maybe when we don’t want to be Pre-filling the composer maybe we want To like just show some macro suggestions That are contextual there’s a lot of Flows that are repetitive we’ve been Working on things like flow finding like Trying to like understand the kind of The common steps people will go through And you know but I mean I guess the big Message is Um we do believe that this sort of Generative Tech it needs to be shaped it Needs to be made good so that it’s not Annoying so that it’s not giving you Wrong things misleading you certainly Never pushing more work or more stress On you than than you would without it we Do believe that its time is coming and We’re trying to figure out like the best Ways to uh to really make people more Efficient and and you know yeah to to Leverage it to put it in a production Setting that actually works for people Um two sort of final questions one is Like uh We’re talking about support what other Industries do you think we’ll we’ll get Like we’ll see value out of this or Early days it feels like support is is a Target Rich environment for this type of Tech but it’s arter orders well I mean
Uh obviously like I’m bullish in support We’re bullish on support and there’s Just there’s so many things that are you Know wrote like it’s like oh like the The agent’s pretty early on recognizes That this is a problem of the following So aren’t just you know it’s a reset my Account yeah something like a sort of Autocomplete Powers is like experience Yeah there’s so much structure in that Area and you know so there’s a Combination of like real customer Problem structure Um meets like technology that’s very Good at like dealing with natural Language reshaping it Um We can see where we can see like a Button you can press to make what’s in The composer more formal right or like a Button to make it more apologizing right Yeah we can see things Um yeah things like that oh yeah other Things as well and we think it’s we Think it’s it’s very it’s very exciting Area at the moment I don’t want to go Into everything totally speculatively But uh you know even before this recent Thing we were you know the machine Learning team is kind of all in in this Area and so we’re big Believers in Support Um outside support you know anything Where there’s
Um like structure in the task and where It is like a human approver who’s able To discern whether an answer is right or Wrong Um this is gonna seem like weird Intuition but you know in computer Science or in cryptography we pay Attention to the certain types of Problems where you know it’s easy to Verify an answer is correct but hard to Go and like find that answer yeah Problems that P equals MP yeah yeah okay Complexity classes all that sort of Stuff which is very easy to get confused About so I’m not going to try and do That in the fly but uh but yeah like um You know people are interested in Problems like that now I can’t help but Think there’s there’s similar intuition Here if you have a challenge where it’s Pretty easy for a human to verify Whether an answer is correct or not but It’s laborious for them to go and like Look that up fish that out and or it may Be in the limit case where you know like Don’t know versus don’t care and you Know where human doesn’t care whether The answer is correct yeah because there Is no such thing as correct like it’s Like write me a poem about this wire Like okay reformative slightly yeah so That class will probably more either Either like checking is checking Validating answers very cheap yeah
Creating answer very expensive or there Is no valid answer and you just actually Need yeah you just need the presence and Now after the technology right now and The answer might be different in six Months or in a year yeah and I would Look at the programming thing like it Could be that in a year that instead the Answer is something more like hey Anytime where a computer can check Whether the answer is correct or not and Or it could be that like any time that You know anytime the domain is Sufficiently simple the machine Learning System will definitely give you highly Likely give you the right answer so it’s An evolving thing yeah and I think it’s Hard to set limits at the moment yeah Um but uh and you know other Industries Other domains like you know like Computer programming right like experts Person sitting down at the terminal they Got to review the code anyway and They’re able to do that and you know Like it is like okay there can be a Subtle bug somewhere in your code Sometimes it’s easier to like write a Code yourself than identify this little Bug but a lot of the times like if you Look at the workflow of a computer Programmer a lot of the time it’s like Ah I know how to do this but I don’t Remember exactly how to use this live I’m going to Google for it I’m going to
Go to stack Overflow I’m going to copy Paste the answer number three from stock Overflow put that back in change it a Bit or add a little better maybe not Supposed to copy paste directly from Stack Overflow for copyright I’m sure Probably Nobody Does that yeah but Somebody might and then and you know but The idea is that like they can you know When you see answer number three and Stack over and you’ll be like oh yeah That’s right that’s what I want yeah There’s a whole workflow like that that Occupies a lot of programmer time and Now you know co-pilot comes along and Run doesn’t end run around that and then Also will like reformat the code to like Fit in with your existing variable names And stuff that’s extremely powerful and You know we started we talk about like What is co-pilot for customer support You know like there’s a and we we have Like prototypes so there’s a lot you can Kind of like play with and maybe maybe You just you don’t answer the the full Question you just give it like the two Or three word answer and then it writes It out and then you modify it and you’re Like make that more formal make that Longer make that shorter it feels like There’s a lot we can do here and what Are we shipping in January [Laughter] We’re gonna have to like censor this
Part of the car [Music] This is being great we’ll check in and I Guess in two more weeks when the whole World’s changed again yeah exactly but If not it could be a few months thanks Very much by the time this is up on the Web I’m sure it’ll be out of date Mary Look foolish but that’s the nature of This being that’s absolutely that’s why You’re working that’s why we’re working That’s why it’s exciting yeah thanks for Having me