Mark Saroufim (Machine Learning - The Big Stagnation)

By Judgment Call Podcast February 10, 2021 2:44 PM UTC

Download full episode here

In this episode of the Judgment Call Podcast Mark Saroufim and I talk about:

Has the AI community become intellectually lazy and is there a a stagnation in progress?

Why is AI so complicated and what are the gatekeepers in AI research right now?

Should we expect Artificial General Intelligence soon? How predictable is the progress in AI?

Why is there so little innovation in creating 'cheaper products'?

The surprising libertarian roots of Lebanon.

Why the Internet is so great of connected like-minded people.

The future of education - revisited.

and much more!

Mark Saroufim currently works with Graphcore and has previously worked for Microsoft, NASA and studied at UC San Diego and UC Berkeley. He is a prolific writer at Medium including his latest article Machine Learning: The Great Stagnation.

You may reach Mark via his website.

Welcome to the Judgement Call Podcast, a podcast where I bring together some of the most curious minds on the planet. Risk takers, adventurers, travelers, investors, entrepreneurs, and simply mind marketers. To find all episodes of this show, simply go to Spotify, iTunes, or YouTube, or go to our website, judgementcallpodcast.com. If you like this show, please consider leaving a review on iTunes or subscribe to us on YouTube. This episode of the Judgement Call Podcast is sponsored by Mighty Travels Premium. Full disclosure, this is my business. We do at Mighty Travels Premium is to find the airfare deals that you really want. Thousands of subscribers have saved up to 95% in the airfare. Those include $150 roundtrip tickets to Hawaii from many cities in the US, or $600 life lead tickets in business class from the US to Asia, or $100 business class life lead tickets from Africa roundtrip all the way to Asia. In case you didn't know, about half the world is open for business again and except travelers. Most of those countries are in South America, Africa, and Eastern Europe. To try out Mighty Travels Premium, go to mightytravels.com slash MTP, or if that's too many letters for you, simply go to MTP, the number 4, and the letter U.com to sign up for your 30 day free trial. Today here with me is Mark Zarofim, and Mark runs his own AI company called URI AI, and he helps developers train AI actually for their video games. And before that, Mark has worked at Microsoft, NASA, and he studied in Berkeley and at UC San Diego. And Mark is not just super smart and knows everything about AIs. He also is a prolific writer, I would say, I actually discovered him through his article on medium.com, where he pokes fun at AI at the current state of AI in the way it works. Welcome to the judgment call podcast. Mark, how are you? Thank you, Thurston. Thank you so much. I should mention I do right now work at a great company called Graphcore, where we built our own AI chips, so to accelerate machine learning. So like the startup life is behind me, I think I'll definitely get back to it, but right now taking a break, I would say. Lucky you, you escaped during the pandemic. It might be a good time to escape, man. It's probably easier to hang out in a company, especially if you have the option to just work seamlessly remote. What I want to do is, and I'm not sure how many people in the audience have seen the article, it was ranked really well on Hacker News, which is like a news aggregator, like a Reddit for hackers, but for I would say developers more than hackers, right? It's not in the dark web, it's a site that aggregates certain news, and it was ranked pretty well. That's also how I found you. And let me just read a little bit from that, and then we go into, hopefully you can explain what you actually wanted to express there, if people don't understand it already. It basically makes that claim, from what I understood, that there's a big stagnation in AI research, and seemingly the serendipitous way of finding knowledge on Twitter is much better than what is going on in the AI community, which reveals a price to me, kind of as an outsider. And what you wrote is, and I'm just quoting you now, the academics sacrifice material opportunity costs in exchange for intellectual property. Society admires risk takers, for it is only by their heroic self sacrifice that society actually moves forward. Unfortunately, most of the admiration and prestige we have towards academics in AI are from a bygone time. Economists were the first to figure out how to maintain the prestige of academia, while taking on no monetary or intellectual risk. We show up on CNBC Finance and talk about corrections and irrational, irrational fear and exuberance. And regardless of how correct their predictions were, their media personalities grew. Feedback loops from the YouTube recommendation algorithm. Machine learning researchers are now engaging in a risk free, high income, high prestige works. Machine learning PhD students are the new investment banking analysts. Both seek optionality in their career choices, but different superficial ways, like preferring meditation over parties, and marijuana, over admiral, and alcohol and cocaine. The machine learning PhD, and that's how you describe it, is now just an extended interview for working in FANG, the acronym for Facebook, Apple, Amazon, I also get what the N is, Netflix, and Google. So those are really strong and funny words, right? I really admire your sense of sarcasm, and if you have a little bit of the insider in the community, I would say more of an outsider, but I really observe a lot of what's going on in the AI community. What led you to write this article? What led you to put it in such strong words? Yeah, those are two interesting questions. I had written a lot before, specifically over the last year. I started writing a lot online because I was having a lot of trouble when I would make some change to my URI product, which was my own company, I'd have a lot of trouble telling people about it. Like, well, I did that, you know, now what? And so that's sort of what got me down the path of writing a lot online. It was just really a way for me to potentially meet customers was initially the main reasoning. And it ended up being like the best marketing channel like I could have ever thought of, because the ebook that I wrote at the time, Robot Overlord Manual, was being read by like 30,000 people monthly, whereas like my startup homepage, which I worked on for about two years, would maybe get like 20 monthly visitors or something. And so for me, it was definitely like, you know, many orders of magnitude more impactful because then I could get a better sense of who's doing what, what are their problems, you know, who are some good cofounders I could get. So it solved like many problems I had simultaneously. As far as this article goes, one, I wanted to try something different. Like I sort of knew initially that I wanted something to sound more like myself. So like people that know me, like my friends, like this is relatively how I talk usually, whereas like my previous content was much more informative, like really it was, here's this idea explained in the simplest way possible with the simplest code with the simple figures. And so I would say the vast majority of my prior writing was like that, whereas here I expressly wanted to have something that would sort of like trigger people's emotions a bit more because of a sort of a longstanding, almost like meme that we have in the AI community nowadays. And so I first experienced that I would say when I first joined Microsoft at the time, I remember like it definitely seemed to me that there was a lot of centralization of ideas. As in like you look at, you know, what are the best researchers at Google doing, then you try to take that work, you know, reverse engineer a few more details that may not exist in the paper, improve it, make it faster, and, and, and etc. And I think this problem got markedly worse when large language models became popular. So large language models are essentially, there's actually really great technique. I think people shouldn't misunderstand, I mean, I'm, I'm very impressed by large language models. I think that's one thing a lot of people misunderstood about my article. However, like what they've essentially done is that they've made it so you're essentially almost like a consumer of someone else's intellectual property. So you're a machine learning engineer, you're making a lot of money. And you think you're very smart too, like, you know, your friends, your family are like, oh, wow, you're working in AI, like that's what all the really smart people do. But essentially, you know, you're doing something along the lines of like just inputting some sort of textual data, and then you're getting out like some sort of prediction for a downstream task. Yes, there is a lot of domain expertise to make these things work. But I generally felt that it almost became like a race where a very frequent conversation, I had a lot of my colleagues, they'd be like, oh, have you heard of this kind of transformer? And then you'd be like, yeah, like I read, you know, such and such, and it's different and such and such way. And then the next day, a new one comes out. And so the feeling it reminded me of was almost like the early days of front end web dev frameworks, I guess before React was dominant. It felt like there was a new JavaScript framework coming out every day. And so if you're new to the field and you want to learn more about it, it was just a giant pain, because you just weren't sure if you're committing to the wrong tool. I would say machine learning, it's not that bad. But I definitely felt that we got to a point where people weren't doing first principle thinking. They weren't trusting their own opinions for what to do. They weren't running their own experiments. They were trusting some sort of central authorities that they think are really intelligent to tell them, this is the right way to do machine learning. And then they just do it, no questions asked. And I found that to be a big, big sign of stagnation. Yeah, I keep talking about the stagnation in productivity growth. And I bring this topic up a lot and I get all kinds of reactions. You know, that's Peter Thiel's thesis. I get all kinds of reactions from this is just the first world problem to it's not real to we really need to address it. I actually feel there is something nefarious going on with the big stagnation in terms of productivity growth outside of semiconductors. But I feel like most people I talked to are not really, they're not really worried about it. I'm kind of surprised how complacent they are. And one thing that, and that's why I find that really funny, the way you presented it in that article is AI is one of those things that everyone immediately comes to and says, well, AI is where we see all this progress. So AI is definitely not stagnating at all. How can you even say that? Because from the outside, it seems like, and I had David Orban on, who is part of the Peter Thiel mentorship program. He says, you know, in AI, there's two things that happen. One is we're not only making obviously progress in it. But the other thing that we are, we are seeing is that the progress rate of progress actually goes up. So instead of doubling every 18 months, which was the records wilds idea. He announces, you know, most AIs, we were accustomed to somewhere between six and eight as a generation in month and out two months in terms of progress. And we see this with GPT three, which surprised everyone that it actually that that is that good. I think it also surprised the creators a lot. They were like, not expecting this, you know, you run that model and at the end you use it and you're like, oops, that's actually more interesting than way more than GPT two that the precursor. So how can you say that is there's a stagnation in AI? Is that is that really a first of all problem because you're talking now in terms of many few months or weeks or how do you mean that? Yes. So I think it comes down to here, how we're defining progress. Like if we're defining progress as the size of models that we can run over time, then the field is growing exponentially quickly, right? Because models are increasing exponentially in size. But the end level performance on tasks on which those models are evaluated is not increasing exponentially. In fact, there's extremely diminishing returns with model size, simply because just by virtue, it's very, you know, getting from, you know, 50% to 90% accuracy, you know, that's a big deal. Getting from 90% to 95% is really hard and getting from 99 to 99.9 is maybe harder than all of those things combined. And so you do definitely get diminishing returns with model sizes. And so, however, you know, like I also poked fun at this theory that like so there's a lesson that Richard student first, like I think he was the first person to really make this clear and he calls it the bitter lesson. And so I expressed it in a meme format where I showed essentially a first year student saying like, okay, like he he like I'm going to use Q learning to run some sort of reinforcement learning agent. And then in the middle, there's the guy, you know, really thinking hard about some crazy theoretical ideas and doing some constraints on the entropy. So sort of very complex sounding. And then at the right hand, there's like sort of like the intelligent researcher that says, well, actually the simplest techniques are like the best techniques are the ones that are simplest to scale like you learning. And so my point again here is that one reality has sort of told us that complex methods in machine learning doesn't really don't really work like simple methods scaled work better. And so now you're like, okay, great. So now it turns out that that's that's what does the best. And then also that we have way good ways of scaling them. However, the returns we're getting from these scales are diminishing. And we're sort of just hitting a wall now when it just comes to, let's say throwing, you know, twice like GPT four, you know, I'm sure it's going to be impressive, but it's not going to be that much more impressive than GPT three. And similarly, if GPT five follows a similar trend, it's going to be less impressive than GPT three relative to GPT two. So that this is sort of like my intent with stagnate stagnation like obviously it's it's a it's definitely like a relative term. However, when I say stagnation and machine learning, I actually mean this in a very narrow sense, like I mean stagnation within core machine learning as in techniques to run machine learning models. However, where I haven't seen a stagnation at all and actually I've seen a boom of ideas have been stuff around like the language and infrastructure space. So companies like hugging face or fast AI, or hash torch, or like on the infrastructure space like creating easier, making it easier to create reinforcement learning environments, like with unity ML agents. And then also where I work graph core, which is, well, you know, we're hitting a wall with Moore's law. And so it behooves us to sort of start thinking about like various architectures that could actually make running machine learning models faster. And so, again, and so take into this to extreme like there's also topics I didn't discuss in my article from stuff like Alpha fold or general like biotech in general, from stuff to like more efficient energy distribution. So definitely we're not stagnating. However, like, I guess the real one of the reasons I wrote this article was because for people that have like, you know, that feel like they already have like a financial safety net that have been doing, you know, relatively incremental machine learning work for a while now. I don't think they should just be okay with saying, okay, stuff is stagnating, like, whoops, what are we going to do? If you just, if they just were to go out of their immediate comfort zone, there's tons and tons of fields that they could be exposed to where things are just accelerating in a much more interesting way on the application space. Yeah, I mean, one thing that immediately comes to mind when you say that I'm just taking some notes here. You touched a bunch of points and I think there's really some of them are very important. One thing that immediately came to mind for me is, you know, artificial intelligence is just so damn complicated. I consider myself relatively smart and curious about lots of things I've read, you know, a lot of philosophy and it took me a while to understand it, but I got there a couple of years later. And I'm pretty skilled in a bunch of web development languages and I play with carers a lot and I play with some Python to run some AI myself. It's still really complicated and I mean, you would laugh at it when you see it, but I feel it's really challenging just to get these to get an understanding of what what you actually do by changing there's so many variables, right? First is the variables, then there's all these different ways of algorithms you can run, and then you have to validate them sometimes you don't even know is this result better or not. And it's not that easy to tell sometimes if this is a better result or not. So there's a lot of steps involved. And I felt maybe when I read some research papers actually read a bunch on on AI because I find it really, really fascinating. I find it extremely hard for me to make sense. And I'm I'm known other people will do much better than me. I'm not doubting this. But maybe this we've reached that level of what what people can comprehend on a practical level, right? So you mentioned that it's so there is a lot of ideas that come off Twitter where people just put it in practice and try things out. But there isn't much that's happening in academia for some reason anymore. That's really useful. Maybe it's just too complicated. Nobody can comprehend it. So I think, so to me, at least there's sort of two topics you touched on here. One was sort of like as a new beginning to grasp like what are sort of like the core fundamental ideas of machine learning. And like what sort of important understand about them. And then I think another thing you touched on was sort of like the empirical nature of the field and how this makes pure research challenging. And so I think I'll answer both of those. As far as like the core ideas go, like obviously, you know, there's there's tons of them. Like, I mean, I've I've written an ebook about it. I've written, you know, I have tons of videos about them. But I think like the core idea where that where you get like the most mileage for your like for your effort is understanding the idea of doing supervised learning with gradient descent. So as far as I'm concerned, like this is sort of like 80% of what most people need to understand. Like, well, what is this idea? And so the way I like to explain it is like generally, well, think you have like some sort of table, you know, and you're trying to predict one specific column from this table based on what those other columns are. Right. And the way you do this is you pretend like you're not looking at some columns. And then and then you make a guess for what your model ways should be. You look at how far you are from the actual label because you use purposely didn't look at it. And then you update via gradient descent. So the code for this is about like 30 lines of code. And I think this is sort of like the main idea where I think even like with stuff like boot camps, like I think they sort of get it wrong. Like they, they explain all sorts of stuff, even I would say in graduate school, people explain all sorts of ideas and definitely they all have their important role. But if I were to say like, what's the sort of like the key key idea that underlies many stuff, like whether it's playing Dota or running GPT three, like this is actually the main idea. In fact, in GPT three, even though you're just even though there's no humans labeling the text, what you're essentially doing is you're pretending like you're not looking at some words and looking at the surrounding words to guess it. And so you're essentially creating a supervised learning problem for something that's not inherently supervised. So again, this paradigm shows up over and over again. The other aspect you mentioned was the empirical nature. And so I think here what's really important is that people say, well, you know, you sort of need to try things out. And like, and I guess like, and people say this almost as if it's a criticism. But what I find funny about this comment is that this is true for almost like any scientific field, like let's say engines or electricity were, you know, discovered by tinkering before we had like Maxwell's equations. And so this is true really for any any engineering field, like the armchair philosophy generally doesn't help as much as people think. I think it helps once things want to mature more. Once you want to unite things under a common framework to understand a set of problems by just understanding one. So I think that's definitely important. But this idea of empiricism as being the way forward in machine learning actually explains why is it that most of the interesting research is being done by gigantic labs with huge research budgets. And so the reasoning there is, well, let's say you yourself can run experiments more quickly, you can run twice as many as other people. That means you're going to learn roughly twice as fast. So now let's say those experiments also require high server costs. Well, that means it's gated. That means other people can't learn and you can learn. Now let's say your peers are doing the same while they're learning and then they're telling you what they're learning. And some of this information doesn't get published. You just end up getting the all sorts of feedback loops, which means that people that do publish experiments at scale, like in these big labs, can continue to do so for a long time. And it's not entirely obvious how academia can sort of beat this kind of research at its own game. However, if you look at other efforts, like let's say, hugging face or fast AI, all of these efforts were started by a couple of people. They didn't require like millions of dollars to get started. And so I think like the sort of reasoning of like I can't do AI because I don't have millions of dollars. To me, that feels like almost like a failure of imagination. Like, yes, you can't beat research labs at their own game, but you can certainly do other stuff that's tangential to ML that is equally important, which is why I even made the thesis that I believe hugging face is actually the single most important machine learning company in the world today, even though it's a small tiny fraction of the size of Google or Facebook. I think that's what a lot of people don't know enough about me included is the amount of dollars that you need to train these models. So you can write a model and you say that that's what I do. I have a couple of machines, a small cluster where I run those and then it takes a couple of hours, takes a couple of days. But GPT3, I think, I don't know how it was distributed, but apparently it was a price tag of a couple of million dollars just to come up with the model. And then once you have the model, it's kind of like the Google index, you can just use it for queries and that's almost free, so to speak. But they're trying to finance this by charging for the access to GPT3 now, which they shouldn't. I think because it's free, kind of, but that's their business model for open AI that sounds a little weird. For open AI, it's not open because it makes you charge for something they've created. Obviously, they can do whatever they like. I find this just a little odd, a little ironic. Give me an idea for a lot of the models that we see that really help us or what's required in terms of money or in terms of server costs to make them useful or stand out. So honestly, I think the vast majority of useful stuff can be done on commodity computers. A single gaming PC with a 2070 NVIDIA GPU will get you pretty far. You can do very interesting stuff. I think the difference though, but this is also especially true anytime you have explicitly labeled data, which is expensive to get because a human needs to go in and label it. Even if you assume certain minimum wage laws, as soon as you start dealing with hundreds of thousands of millions of examples, it's very expensive to do these and it's very difficult to get them right because a lot of labels are very tricky. Let's say if you even asked me as a labeler to annotate the grammatical sentence of each word in a sentence, like the part of speech tag, I would have a lot of trouble doing that because it's actually genuinely difficult. It's not entirely trivial unless I spend time training myself to do this. So that's one. I think the other aspect as well, I think you're specifically though thinking about these large models that seem to consume the internet. So stuff like GPT. So on one hand, they are expensive to run, to train. One of the things people misunderstand is that you can't make adequate comparisons to something like the Large Hadron Collider. And the reason is because I can't download a pre trained Large Hadron Collider. I can't take a fraction of the cost that the European Union spent on it and then use that to do my own particle physics experiments. But I can take a pre trained language model and then fine tune it on some tests that I have. And I can do this on a commodity machine, like a personal desktop or maybe like a newer laptop or something and be fine. Or even then, just allocate a simple server on AWS with GPUs or IPUs or whatever. And you can actually get pretty far. So I think this idea of I can't do AI unless I have millions, I think that's a misconception. But it is definitely true for models that are semi supervised, like as in that just consume data from the internet. It's certainly true, but the cost can be re leveraged, which means that like society really is putting this one time cost to do it. And that said, I think even, you know, when you think about like companies like Microsoft, like AWS or Google, like they're not dumb, right? Like they're not spending millions of dollars just, you know, to have cool PR reports. They're spending it because they believe like it has material value to them. Whether it does or not, I think remains to be seen. I think we need to see more applications built on top of things like like language models, sort of like, you know, because now there's a lot of excitement that people are trying it out. But I'm sort of talking about the world when, you know, this is it, like it sort of becomes like the same way that you learn. I don't know, like what a pointer is, you would learn how to integrate a language model in your app. Like once we sort of get to that point, then I think the thesis of whether language models will change development forever still remains to be seen. Like, I mean, I think it'll definitely help, but I'm not also entirely sure whether it's the silver bullet people expect it to be. That said, not much of it is wasteful. Okay. I mean, I find the, I listened to a podcast with one of the co creators of GPT. And what he felt is the real differentiator, he said, you know, we can we have this money from the from the foundation so we can run a couple million dollars model. It's not if that's not a limit to that. But he said the big problem for in their position is we when they have GPT three, there's obviously a ton of flaws. We, you know, refer to that earlier. So it's relatively easy to get into the 90% correct analysis with AI these days, but it's really difficult to close a gap to 100 no human is 100. But we feel like we're closer to 100 than in most problems, not every problem, but most problems. And especially with experience with life experience, we get really close to it. Because it might impact your skin in the game, so to speak, right? If we make the wrong call, then we get fired for an AI. It's it doesn't have that problem. And what he was saying is, you know, what would we really have trouble with? And I think you just referred to this and what's the real advantage of the Internet giants. They have all these user data. So if the AI goes wrong, a user will tell them like when you see something on Google that doesn't belong there, you click on it or you do it on YouTube. You feel like, oh, this video is spam or it's porn or whatever you don't want to see right in that moment. So these little corrections help the AI. We feel like this is not a big deal because it's relatively small data points. But the properly used user feedback changes the game so much for AI that he actually said if we can incorporate all this user feedback of where GPT three errors. I was telling him, you know, you need to let it out to the world as more user feedback you get is better. He said if we are able to do this and we get enough people to use it in a way, he said it could look like a GI. So the GPT five, he said is really good chance. It looks like a GI because from from it incorporates all everyone's knowledge as a as a as a human and kind of adds to this statistical analysis. And he says, well, it can go, I mean, it will still not be 100%, but we'll be so close that for most humans, it will be indistinguishable from the GI. Do you think that's true or that's not going to happen with the GI? I mean, I guess like I'm cautiously optimistic, like I'm optimistic because I want to see more progress in the field. But like I find it very difficult to make any sort of predictions, like just because like, I mean, people like to make predictions about like what machine learning is going to be like in five years. And I often respond back to people like I can't predict what the field is going to be like in a week or a month. And I think like that's also a function of things changing a lot. Like definitely when the transformers hit the mainstream, like it sort of like took took everyone's attention by the by storm. But I mean, let's remember these techniques are still like only a couple of years old. And before that people are very excited about other stuff like let's say bidirectional RNNs were all the rage and before that it was CNNs. And you know, even with the CNN phase, I think one thing people forget is people were trying to apply CNNs to everything, whether it's the graphs or images or sequences. And now we see the reverse like people applying transformers to sequences, images, graphs. And it's sort of like, well, it's a framework that works. And I think part of it is that like, you know, we've sort of gotten to this point where, you know, it's because you talked about being fired. So I think an interesting point there is I think no one will get fired for doing GPT for even if it's very costly. Right. So and I think like, so we're at a point really where like the incentives kind of seem a feat to me. And you know, if you're going to say like, look, you know, I want to fund the team to do like low cost low cost research. Well, you know, obviously you're going to get less funding. So there's less incentive for you to say something like that. And so I think like, like basically like research leads incentives are misaligned with trying to make research cheaper, at least at larger companies. And I think the only people that are going to really push affordable research are people that do it out of necessity. So one example that really stuck with me was like early in my career, I interned at Berkeley. And we were working on a semi autonomous sub and we wanted to make the sub cost about like $300 and you could control the thing with your phone. And at the time, like, you know, maybe now this is obvious, but this is like about like 10 years ago now. There wasn't really a good commercial alternative that cost like less than $5,000. And I remember telling my advisor at the time, like, you know, what, why are we doing this? Like, I mean, it already exists. Like, why don't we just buy this 5k thing? Because we're obviously spending time developing this. And his answer was like, look, actually making stuff cheaper is a huge innovation. And I think a great example of this is space travel, where, you know, like, why don't we travel to space more? Well, because it's really expensive, like, well, why is it expensive? Well, because we can't reuse rockets. Okay, great. Let's build reusable rockets. And so I think you often get interesting answers if you ask the hard questions of how to make things cheaper. In the case of large language models, I think there's a few directions that seem promising. So some are basically what I think hugging face and a lot of other people do called the weight, weight pruning and movement pruning, which is you basically look at weights that don't change a lot across a run, and then you remove them. But then also at Graphcore, like, there's a lot of effort to think like, well, actually, maybe it's the GPU that's the bottleneck here. It's not the techniques themselves. And so I think you sort of get into this, like, interesting trade off because typically a lot of algorithms, like, went out, not because they're necessarily the best, but because they're also the best on existing hardware. So I think the person who makes this case the best is a researcher at Google called Sarah Hooker. So she has a paper called the hardware lottery where she goes over this point in a lot of detail with lots of historical examples. And I just thought it was like a fantastic, fantastic paper. And I'd recommend anyone in machine learning that has concerns about stagnation to just check it out because I think it makes a lot of points. You mentioned that nobody really worries about making things cheaper anymore or less people worry about that. So I call this the Chinese approach. So a lot of people say that if you go to business school in China, they first tell you, find a big market that's big enough and a useful product is already in that market. Then replicate it on your own, make it cheaper, and then start innovating, make it slightly better. And that's been working in manufacturing from many countries. China isn't the first to discover this, right? So Korea did the same. Japan did the same. The US actually did the same back in the early 20th century using knowledge mostly from Britain. That's exactly natural behavior. I feel like there's two souls in my mind. Like you, I wasn't born in the US. I was born in Germany, which has a slightly different approach. They've really excelled at manufacturing. But the same problem there is that on one hand, you feel like there's two possible innovations. One to make something extremely cheap and then we roll it out to the wall. Kind of what Google did, right? This is a query that you could ask a librarian and you would get an answer. But it would take days. It would be very expensive. And you just nobody undertook it. But with Google, it takes a few milliseconds and you're good to go. So it made something cheaper and accessible. But that's a real innovation. But a lot of people are only looking at that high end. So we're looking at Tesla. We're looking at extreme luxury products. So if we look at the best possible, I call it the iPhone, Max Pro, double S, right? So which is kind of at that completely high end and people marvel about this. And that's what seems to attract a lot of youngsters especially. When I look at my children, they have no idea what economic benefits come from making things cheaper and rolling it out to the world and democratizing it, which was a big part of the innovation in the last 100 years or so. It seems what they're really interested in is like the Gucci watch. And the innovation is making this thing really shiny and great. And there might be some technology in it that's really great. But they're really not interested in what the price tag is. It seems to be, oh, the price tag, that's not my problem. That's someone else's problem. And I feel, do you think this is generally an issue in Silicon Valley? And that's just about the AI now. Because that's what I feel. I feel like this is a general issue in Silicon Valley that they've forgotten about the cheap innovations, which they seem to have zero appeal to people in the valley these days. So yeah, I think there's a fact that you refer to as in like by making things cheaper, you end up getting more of those things and then the market changes. So this is called Wright's law. And so the idea is like, even if you think like, well, if self driving cars can be really cheap and efficient, that means they can become cheaper than humans. That means more people will take taxis. So that's sort of the reasoning there. So similarly with machine learning, if machine learning models become cheaper, it means more people can train them. It means more people are learning. And then the feedback loops are sort of shared by everyone instead of being shared by people that can only afford these models. So I mean like here, I don't think people are doing this on purpose, right? It's not like people are saying, I don't want to do cheap models. I want to do expensive things. However, and maybe we can talk a bit more about the current publishing ecosystem. We have this attitude called SOTA chasing. SOTA means it stands for state of the art. And what this does is it basically encourages you to have models that perform the best ever. Like, you know, this model is the best ever on this data set. And if that's not the case, then it's actually very, very challenging to get your paper published unless it's extremely innovative in some other way. That said, most papers that I see frequently shared and see buzz around are rarely in the latter. Usually they're the former. It's, look, here's a new shiny thing that beats the state of the art. And one thing I poked Fana in the article is there's a very, very reliable way to get state of the art. And it's an algorithm called Graduate Student Descent. And the way the algorithm works is you look at some paper that gets state of the art. You take the code, you make some random changes to it, and the first random change that you get that makes the results better, you publish it. And so what this does is, like, now you're adding a part on top of something else, which means the model is even more complex. It's even more difficult to train. But it works reliably. And I would suspect if people trained more on data sets that weren't common in benchmarks, as in people, like, let's say, instead of using, I don't know, like hospital records, like using, I don't know, like DND conversations. So just like a data set that you've not been exposed to at all, I would suspect that like most existing techniques would show like absolutely terrible performance. And that we're sort of overfitting on data sets. And I think a big part of that is the reinforcement learning community. Like one of the most famous benchmarks is Atari games. And in Atari games, like basically, you know, like they're cool, like some of them are hard, but we have like much more interesting games, right? Like we have games like Dota, like chess, you know, like you have thousands of games on Steam that have like various interesting like strategic integracies that test very different things. Like some games are stock market simulators. Some games are survival simulators. So you can actually test out whatever skill you want in a simulator, as opposed to arguing with people online about what you think the right way to do machine learning is. And so I think instead of people arguing over state of the art over what is the true way of doing machine learning, I think if people spent more time creating interesting data sets, and I think games are one of the best ways to create a data set, that we would have a lot less flame wars and a lot more progress in the area in general. Yeah, I realized that defining the right data set and then fitting it with the right AI model is the real crux of that problem, right? So a couple of episodes ago, I was talking to Mike Finaldstein in the life science field, there's a lot of need for AI. Basically, I could replace your doctor or could help your doctor, right, to get all the basic problems out of the way, and the doctor really focuses only on that last little human interaction, that last 1%. But the data sets are crappy, and they're really depending on the field, but if you want to make a wider impact and say what we have predictive data for certain diseases based on certain test results or based on certain DNA results, there is limited amounts of data available, and a lot of people run better algorithms on it, because as you say, you want to beat the last algorithm, and you overfit for that particular data set. If you would take a data set, say from a different locale, say from a different hospital, you might have very different results. So it's a real problem, and I'm saying this as a generalist, but finding the right data set and massaging it in a way that it has the best results is a real problem. I think a lot of people don't pay enough attention to this, because not everyone has the luxury, like Google, they have this enormous amount of people clicking on their search results, so that gives a lot of good, reliable, seemingly similar data user input as a data set. A lot of data sets are there in real life, and I think this is kind of how the human mind developed, right? We have very imperfect data sets and still have to make a pretty decent decision. And with AI, we haven't gotten there yet, right? So a lot of them, seemingly, and I think GPT3 might be the exception to this, but you feel like they are so dependent on the data set and have a very narrow focus on the solution. And I don't know if there's probably lots of researchers who work on this, but I think this is one of the biggest limitations right now for AI, right? It doesn't really scale if it doesn't get access to better data, which is outside of the AI. Developers say, okay, this is outside of my scope, because I'm not responsible for the data set, but for the outsider, it's the same problem. So the scale aspect is interesting, because if you look at, let's say, just interlevel job postings and data, most of them will ask you for experience in TensorFlow and PyTorch and stuff like that, but often you get hired at these companies, and actually what they wanted was a data engineer. And so you end up in this really bizarre situation where people that have that skill set have more trouble finding a data engineering job than people that are actually talented data engineers. And I think this could also be partly explained by having people, like if you're hiring for your team, and you're not drafting the hiring requirements yourself, I think this can be very easy to happen. Like you're sort of delegating recruiting to some foreign entity that for some reason is better than you, which I also find very bizarre. But also to your point on feedback loops, so what's interesting, like if it was possible for Microsoft, for example, to make algorithmic innovations to search to become better than Google, they would have done it already. And the hard truth is that that's not really, like the algorithms sort of don't matter as much, as much as it is like the signal of people clicking. And so there's feedback loops. You clicking means the product gets better, it gets better for you, it gets better for the company, which means you get even better. So if you're not getting it, you're just stagnating or maybe getting incrementally better. And so I think like that this idea of feedback loops creates almost like natural monopolies as far as algorithms go. And it's just like very difficult to unseat or think of any way. I mean, I don't think right now, for example, it's possible for some really talented smart PhD student to be in their bedroom and come up with a better search algorithm than Google. Like that's just not going to happen. And I think a big part of it isn't necessarily the infrastructure, it's really the historical data and clicks. I think that's sort of like the main asset and the mode that's like unforgeable and very difficult to get. Yeah, no, that's a real issue. And I think it really hampers innovation. I've been going into what we haven't really looked into this to be honest, but we have looked at deep. Sometimes it's an accidental monopoly. And everyone wants to be in the monopoly. Let's put it this way. It's like Peter Thiel don't go in a business where you can't be the monopoly. It makes a lot of sense, but we expected these monopolies to be broken up by the next wave of technology, right? You make that point that we can't replace the user's signal for Google search. But, you know, YouTube comes around and replaces search with videos and it's completely different while they bought YouTube. So that didn't happen. Or Facebook has a social graph instead of just a search graph. So maybe that should change the game. It kind of did, but it kind of didn't, right? It didn't replace Google, at least not in a major scale. It became a side product, so to speak, and they could have bought Facebook and then that would be part of Google. The normity of these monopolies, I think that's surprising. I kind of blame the Fed a little bit because I feel all these companies have access to unlimited amounts of money at 0% interest rate and startups pay 15%. So, you know, what do you expect? Then the startups will not go anywhere. Yeah, so you can't escape that loop. I mean, this money is not necessarily meant for Google, right? It's meant for all these ailing zombie companies that are out there that are supposed to need that 0% interest. Probably true. We should get rid of those companies and fix that monopoly in the same way. I think if there is someone, there's always a bunch of startups who really have something that could take off. It hasn't happened. And I think this is strange. It should have happened. Google should be, you know, much smaller player than it is right now in terms of total market size. I mean, market share. Let's put it this way, not in total revenue that they do. And I know you make a couple of sounds when I read through a couple of other articles where you seem to embrace the libertarian mindset. Is it something that comes out of, what was your inspiration? Is it Nassim Talib? Where did you initially see that's something for you? Yeah, so that's interesting. So definitely, I think when I was younger, I identified as a socialist because I generally felt like I wanted to help people. I wanted to have people, you know, people that have a better social safety net, people to be happier. And yeah, while I do identify with parts of libertarianism, there's big chunks of it that I think is very idiotic, like what I call naive libertarianism. Like for example, like naive libertarianism is insisting that, you know, I don't know, like masks are a bad idea or that like there should be, you know, no military or no borders. Like they're sort of like, no, like even if you were to go read Hayek, like he does advocate for strong pandemic response, a strong military, a strong border. And so I think people just take like a philosophy in a very dogmatic way and then just go on misunderstanding it. But the way I got exposed, I think, to the philosophy at least was, I think a big part of it was Talib and he's just generally a huge inspiration for me and for my writing and a lot of my thoughts. But it comes really from growing up in Lebanon. And so like Lebanon is a very, very small country. And for the past like 30 years, the government has failed providing very basic goods and services to the country. For example, we don't have 24 hour electricity. Like we have like, like we don't have just like one military owning like all like military power. Like the borders like are closed. The government was lending money from banks and as a result, people lost their deposits last year completely. And so this was to me an example of like, okay, well look, this is a government that centralized a lot of their power. Like they run electricity, they own the airport, they own the port. So pretty much all the important services in the country are owned by the government and they were allowed to fail for 30 years with no consequence because they can continue being bailed out by people's deposits. And so for me, like the idea of, you know, I don't mind unlimited upside, but I don't forgive a lot, not allowing people to fail. Like I think, like let's say in my case, like I try to start up like it didn't work out. I went and I got a job. I didn't sit and blame society and say, well, it's because of the economy or it's because I didn't have millions of dollars. Like it's my own fault. Like, you know, I learned from it. I deal with it. I move on. Whereas I think with a lot of government institutions, like they can continually be allowed to fail. And then they can also basically brainwash people in the news telling them why them stealing from them is good for them. And so obviously the US is not Lebanon. Like the US is a much fairer country. Obviously like it has lots of issues that can be fixed or maybe are others that are more difficult to fix. But I think for me, looking at this difference, like looking at a corrupt government and witnessing first time and what it did to me and my parents, it made me lose all hopes of thinking, oh, a big government is eventually going to control something and then make it better for everyone. I think this is obviously great insight. I grew up with communism and, you know, I had that experience too. It's a little more nefarious than just failing. It actually puts you in prison for saying the wrong thing on YouTube, so to speak. There was no YouTube around when I grew up, obviously. And Lebanon seems to create a lot of these personalities. So there's a huge Lebanese diaspora in Africa. And they seem to prosper in circumstances where nobody else prosper. So like let's say there's a bunch of Germans who tried to open a restaurant, it all failed. And there's a bunch of French people, they all left and it's all gone, going nowhere. But there's always a Lebanese place or like 15 Lebanese places. I'm not saying this is just restaurants. There's all kinds of business in Kenya, for instance, which has a lot of freedom, but also has not been an easy place to do business. But there's an Indian diaspora, but a huge Lebanese diaspora who pulled this off and that goes on for pretty much any African country. So there is something to the Lebanese mind, I think, and Nassim gives us that financial and economics knowledge that seems to, I think it's undervalued. It's doing well in the really harsh circumstances when, as you said, you can't rely on anyone. You can't rely on the government. There might be people in your country who hate you. There is really no way to just keep blaming other people. You can do it, but it's not going to get you anywhere. You're just going to get shot in the streets. And I was really, I mean, I loved Beirut. I think everyone who has people have weird perceptions of Beirut, then they go down and they're like, oh man, this place is awesome. And it's something that the world can learn from. And I think maybe this is a little bit of a forgotten story, right? The story of Lebanon, because it went through several wars and it looked really difficult in the 80s. Maybe there's more to this philosophy. I don't know if there is, like philosophers have written it down. So it's Nassim. He's kind of a philosopher, so to speak, too. It doesn't have like a century's worth of history or is that a new thing, this libertarian streak? Yeah, so thank you. Those are really, really nice questions. But yeah, so I would say essentially there's like one very common trait among Lebanese people is that like it's very common. Like people say it as a bad thing. Actually, it's like, let's say you just joined a bank and you're a bank teller and you're thinking the next day, okay, great, when am I going to be CEO? And so I think people essentially make this comment to sort of say, well, okay, a lot of Lebanese people don't put in their work, they put in the work. And at least in Lebanon, I think that the culture is not as hardworking as what I've seen, for example, in the US, at least within companies. But I think this attitude of thinking big and thinking I'd rather be like middle class and be like my own boss as opposed to being rich and being someone else's employee. I would say this is a fairly common trait to the population at large. And I think a lot of the examples you gave sort of explain it there. Like, let's say, you know, like in my case, for example, like I'm just like very grateful to being able to work in the US because again, like I felt like in Lebanon, if I wanted, like, let's say I was interested in pursuing neuroscience research. I had a faculty member laugh me out of the room because my grades weren't good. Whereas in the US, a professor was like, yeah, sure, you're interested, like come meet me next week and we'll talk more about this. And so I think like the US like rewards people with initiative in general. And so for me, that's why I feel very grateful and always try to pursue things when I can. I think as a historical point, like, you know, about two years ago, I was interested in making like a small video game about the history of Lebanon, like the war and stuff. And so it got me down a rabbit hole researching Lebanese history and specifically Phoenician history. And one thing, so the Phoenician civilization lasted a long time. It lasted about like 2000 years. Whereas Lebanon as it exists, has existed for about like 40 years or 60. I forget now, I'm embarrassed. But essentially like it's a much newer society, essentially, and it's a much newer system. The older system worked very well because of a few reasons. The Phoenicians realized very early on, we're very small, so we can have no military or political influence on our neighbors. So it's sort of like tapers your ambitions, right? So the next step is to think, okay, great, well, how can I be friends with all my neighbors? As in, let them come, let them, you know, come smoke, let them come buy boats, let them come buy like nice silk, you know, we'll have temples for all of our neighbors. We'll speak all of our neighbors languages because you want them to have a good time. We'll use all of our neighbors currencies. And I think that's something that was definitely lost with a lot of the existing sectarianism in Lebanon, where before it was more like, well, like I'm a sect and I identify with this foreign entity. Whereas I think in the past, it was more realizing like, look, we're actually a tiny country. We can't have all that much impact, even if we'd like to. The most pragmatic thing for us to be is to be traders. In fact, like one hilarious thing about Phoenicians is that they didn't really write much poetry. Like all of their writing was ledgers, like they had lots of receipts, and that's most of what we have from the, from the culture at this point. And I think this is a culture of pragmatism, right? I think, yeah, you can't like maybe poetry is great and it intellectually moves you or stuff. But generally, I think that's more of a concern of richer nations. I think if you're still, you know, an underdog and on your way up being a pragmatist is generally probably the way to go. Well, I have this theory that I, and I don't know if that's right yet, but I have this theory that entrepreneurs and philosophers are kind of, it's the same kind of person, right? You kind of have this lens about the world. And I extend this almost to risk takers, but I see this a lot with these two personality types. And the philosopher kind of doesn't have the tools, you know, to, his tools are the words, or her tools are the words. For entrepreneurs, it's, you know, creating a business and then creating that, that lens of reality and building that up, that up. So I feel that they very much go along. And as you say, from Lebanon, we know more about the entrepreneurs. We don't know so much about, about the philosophers. And certainly there is a, it's a, it's a wealth question, but maybe, maybe that's what it is. Because as a philosopher, it's much harder to get this distribution to work. It's much easier to create your own world, right? Just start from scratch and just saying, okay, we're going to redo the whole thing like Karl Marx said, right? And this is all wrong and just let's focus on one thing. This is relatively easy, but then get the distribution, get enough people convinced to speak your language, that have the same experience. That's hard. And I think Lebanon now, that is split by religious lines. It's, and then there's a bunch of languages, right? So it's really difficult to, to get that scale going. I think maybe that role moved more, I don't know, more to do, to the Turks, right? I don't know, maybe they had enough, the Ottoman Empire was big enough to do that. So I think this is a great segue into writing, because what I noticed is that like a lot of, I guess, like my thoughts were considered fairly wacky by my friends early on. So something that I've been very bullish on for a long time has been homeschooling like way before the pandemic. I was telling my friends, like, look, I think school is a giant waste of time. Like you spend 20 years in higher education, like it's the longest initiation right in history. It's eight hours a day. Really, its main function is babysitting. You can't really pursue what you want. There is no such thing as basics of a field. It's really what like my interests haven't changed since I'm eight. I like computers and I like video games. I still do, you know, like stuff, obviously stuff has gotten refined and stuff has been added. But the core foundation really is still there. And I wasn't really allowed to pursue it. I felt until I decided to leave my Microsoft job, which for a lot of my friends was considered like an insane decision. Because when I left, I didn't have a clear sense of like, look, this is what I'm going to do. This is going to be my business model. I'm just like, well, I was like, well, I think I'm pretty sure I can figure this out. You know, I have two years of runway. I think I'll figure something out in this time. And while I definitely don't think I was as successful as an entrepreneur as I would have liked, I think one thing that really surprised me was the amount of impact I was able to have on people with the stuff that I wrote. And the reason was like, I would write stuff like say, say a lot of these similar points to my friends and they would just think I'm insane. They're like, what are you talking about? This doesn't make sense. But when I would write them down, they would agree with me. And it made me realize like, yes, you know, part of it is I could be a better and more compelling speaker. But the other part is like, well, if there's something that's so different, you sort of need to sit down and think about it. And I think the internet ends up being like this natural attractor for peers where like, let's say in the past, I think a lot of people think like, oh, like, I want to talk about work stuff. Let me talk to my colleagues. And I think that's great. If you do have colleagues like that, you should really cherish them. But I think the other thing that people neglected to mention is that your colleagues aren't like your friends, like your colleagues aren't therapy, right? And so I think the internet has basically been like the best, like almost like dating service for nerds that I've ever witnessed in my life. And it's shocking to me that I only really learned how to use the internet when I turned about 28. Before that, for me, the primary purpose was maybe some networking like LinkedIn, but it wasn't like this channel where I could have an idea, clean it up and then attract the kind of people that would enjoy this and give me the best possible feedback. And it's like, honestly, totally changed my life. Like, I think it's just like, like, yeah, countless benefits. Oh, I fully agree with you. I was actually starting a blog about entrepreneurship in 2002. So the word blog was just not in anybody's mind. It got a lot of attention because a lot of people were doing it, right? So I got a lot of exposure. What I was, and it was the first thing I wrote in English and before that I only spoke German. It was terrible, right? So when I look back, I'm like, oh my gosh, what did I do? And it kind of reads like German, but the English letters. But still, I got a ton of personal contacts from people I was exposed to through the writing. And I went to the US a couple of times and met a lot of them in person. I was like, oh my gosh, these are professors at the university in Chicago. Those were practitioners, those were economics professors, there was a book Freakinomics at the time. I met the author and I was invited to a lot of entrepreneurs in Silicon Valley and on the East because I'm like the best thing I've ever done, right? Because people liked what I was writing. They wanted to know more about the viewpoint that I had on the world. And that was it, right? It was not necessarily we're going to do business together and we're going to create a new startup. That happened too, but I was really thankful for that opportunity eventually went away and the block didn't work anymore. And then I did my own startups and was really focused on that. And it is something that the internet really enables us. And I felt the sense of education, as you said just a moment ago, I'm fully with you. The education should be completely different by now because we know that the education, that these institutions were made for a different age for the industrial age. And the idea that we should have the most common sense by the time of 18 and from then on declines was complete nonsense. Most people need at least 35 or 40 to even get a grasp of reality. Me included. I had no clue. I read a lot of Hegel and Kant when I was 19. But these things didn't mean anything to me. I could have written you an essay where I kind of parrot back what they said. But what does that actually mean? And do I use this in my own life? I have no idea, right? I didn't use any of it. So you need a much longer time frame to mature, fortunately or unfortunately. Even if you're like Nietzsche, right? Even if you have these huge IQs, it will take a while to reflect back and make sense of this and validate what these people have written. Maybe it's all nonsense, right? We don't know that. And I always feel there should be complete redo of entertainment in school and education. And something I talked about with Charles, we came to that idea. And I don't know if you've thought about the specifics or mechanics of how this could happen. It's kind of a TikTok YouTube education style, right? Where you have TikTok kind of as a recommendation mechanism, not the content that's on. That's evil right now. But the idea that it's serendipitously gives you content that you weren't exposed to before without having shown any interest necessarily or not at least specifically. Then you have YouTube for the longer version of what's going on in the lecture or we'll be running a certain problem. And then like the third place for me would be like you said, not a home school, but like a neighborhood school where you can go and have similar tasks and similar moments. It's somewhat coordinated with someone who plays the role of the, I wouldn't say teacher, but the herder of the crowd, right? So there's like 30 kids at all at different levels. Maybe some of them work together, but this is right in your neighborhood, maybe in your city block, even in large cities. So that's how I felt. And then you can do this in any time zone anywhere on the planet. This doesn't have to be government run, could be maybe. And you could just move a couple of years to Africa for a couple of years to Bali. And you stay in the same system, so to speak, but everyone in the world is there and the people in Nigeria learn same stuff more or less, maybe different language than we learn back home. S

Mark Saroufim (Machine Learning - The Big Stagnation)

✈️ Save Up to 90% on flights and hotels

Related

Latest