A book club for developers.
BookBytes is a fortnightly (or biweekly) book club for developers. Each episode the hosts discuss part of a book they've been reading. And they also chat with authors about their books. The books are about development, design, ethics, history, and soft skills. Sometimes there are tangents (also known as footnotes).
phi = phi = (1 + Math.sqrt(5))/2 fib = n => Math.pow(n, phi)/Math.sqrt(5)
(Intro music: Electro swing)
Hello, and welcome to BookBytes, a book club podcast for developers. We’re continuing our summer of an Imposter’s syndrome by talking about “The Imposter’s Handbook” by Rob Conery which is a CS primer for self-taught programmers. So today we’re going to go over chapters 8 and 9: Advanced Algorithms and Compilation. I’m Adam Garrett-Harris.
I’m Safia Abdalla.
I’m Jen Luker.
And I’m Jason Staten.
All right, so what about Advanced Algorithms?
I think he kind of mentioned here ke’s kind of using the term “advanced” loosely. It’s kind of just like, algorithms part two, but I don’t know, some of these are kinda complicated.
Yeah, I think their argument was that you are more likely to use some algorithms than others. Like, no one’s really going to make you do a bubble sort, but you’re going to end up using some of these others more often.
Yeah, I like that description where you had mentioned that these are more likely the things that you’re going to get paid to do versus going and writing your own bubble sort is probably not going to keep your employer happy if that’s what you’re doing.
(laughing) Yeah, so it’s like these are solving real problems and they’re not necessarily going to be built into the language or the library you’re using.
So, let’s talk about it! (laughs)
Hey! There you go!
Yeah, so it brings up dynamic programming at the start. Like, what was your takeaway, this is kind of open to whoever, about what dynamic programming is?
Nothing. Absolutely nothing.
Yeah, I had never heard of it before.
I was going to say, it was one of those things that brought horrific flashbacks to my undergraduate computer science courses, because I did learn about dynamic programming in an algorithms course, and I think the big thing was that the way that I used it in my course was like, very theoretical and in hindsight, now that I’ve been in the industry for a while, I think the like, mathematical basis of dynamic programming is a lot less important to grasp than the approach that it lays out for problem solving. Obviously when you’re in college you’re like, given problem sets, people expect you to have a proof and a solution for something, and I was getting really caught up in that, and one of the things that this chapter kind of helped remind me of is that it’s not really about the details but about the general approach that dynamic programming kind of embodies. So, yeah, I liked it.
What is the general approach?
Yeah, so I would say if I could break it out into a simple phrase, dynamic programming is about taking a problem and breaking it into a smaller, more manageable problem and then building on top of that. So, for example, let’s say that you’re asked to like, sort through a list of 10 things. Well, first you would start off by solving the problem of sorting a list of just 9 things, because that’s less complicated to solve than sorting 10 things, and you can kind of break it down further and further like we talked about in, I think it was the last chapter actually, where we were talking about mergesort and how you kind of break it down into the smallest possible piece and then you remember what those individual smaller solutions were and build up to the bigger problem. I hope I’m explaining this well. So, in my mind, that’s what dynamic programming is really about, is when you’re given a huge task, find a task that’s smaller, solve that, and see if you can use the solution to that to solve your bigger problem.
Sounds like the divide-and-conquer approach that we talked about last time.
Yeah, a little bit. I think one of the key elements to dynamic programming, well I’m not totally certain of this, is that recursion is a big part of it which I think, depending on how you implement divide-and-conquer you might be using recursion, but I think the idea with dynamic programming is that like, the function you use to solve for n is the same one you would use to solve for n-1 and for n-2, and you can kind of nest those calls and store the solution, like memoize it, and then build it back up. Wow, this is so weird when you have to like, explain the way you perceive things out, using words and stuff.
Communication’s hard, huh?
Yeah, and I like how he says that memoization is like fancy word for caching-
I always think it's funny, like when you say “memoize” it sounds like you have a speech impediment and you’re trying to say “memorize.”
(laughs) Yeah, it’s kind of a funny word.
(laughs) I had never thought of that.
(laughs) “I’m just gonna memoize this.”
Yeah, I wonder… Yeah, it is a funny word.
I’m so glad that I’m not the only one that picked up on that.
So yeah, I guess that’s... My take on it is just there’s like, way more formal steps about it, like if I am sorting a list of n items, how might I go about sorting a list of n+1 items? Like, what would I need to do to make that jump to like, solve for the next case? And that’s where the recursion comes in. Sorry, I’m getting into the details again, it’s so hard to break away from that, 'cause like, I think the details are definitely important to grasp of dynamic programming, but I think the more relevant thing,and the thing that I think is more applicable in the day-to-day life of a software engineer is just, if you have a problem, see if you can try and break it up into a smaller problem and then build back up from it.
Yeah, that makes sense.
So, breaking something down into a smaller problem and then using the solution to those pieces to create the total solution sounds nothing like a dynamic thing to me, which you know, it sounds much more like recursion or something. So, I loved the story of the origin of the word “dynamic” or the words “dynamic programming.” What did you think of that story?
Yeah, I thought you were about to tell it, why don’t you go ahead and tell the story?
Oh, right. Okay, so the story is essentially, without reading it out, is that a man named Richard Bellman was trying to do mathematics research in the ‘50s, except for the fact that there’s this guy named Wilson as the secretary of defense and he, as the book says, “actually had a pathological fear and hatred of the word, research. I'm not using the term lightly; I'm using it precisely.” So in order to avoid the anger and frustration of the Rand Corporation and the Secretary of Defense, he came up with dynamic programming as essentially a translation for mathematical research. Which kind of turns around says it means absolutely nothing, and it’s been been assigned this algorithmic thing because that’s kind of what it resulted in, was discovering some of these algorithms that we’re going to talk about in a minute. But I just thought that was an awesome story, that the definition actually has nothing to do with the name because the name was technically to hide the concept of mathematical research.
Yeah, it’s just all political.
It kinda reminds me-
Of a term that we run into a lot now. So there’s this scary idea of, think of having a server that’s exposed on the internet, that’s scary! But if you-
Call it something fluffy and nice, like a cloud-
Then everybody’s on board.
Oh man, yeah.
Like, you know those are still servers, right?
The cloud is just someone else’s computer.
Right? Like, it’s just not your own.
Like, you really just want to get rid of the server? Just go serverless.
Ugh. And then serverless isn’t even serverless.
No, it’s not.
But yeah, I think-
It’s, I think, so that kind of, oh my gosh, I’m going to go on a tangent again. Stop.
Let’s hear it!
But no, I thought it was just interesting, this conversation that we’re having about the words that we use to describe things in our field and where they come from and how it kind of circles back to the original thesis of this book that oftentimes, one of the most confusing things about the industry is the jargon and this conversation we just had had like, a certain level of self awareness, and you know, even the story that Jen pointed out in which Bellman shared the origin of the word dynamic programming, there’s like, a certain level of awareness that the ways in which we describe things do have a sense of ambiguity and like, lack of precision to them. So, yeah, I just thought that was an interesting discussion we just had and kind of interesting historical context to it.
Our ability to explain things poorly started in the ‘50s.
Or, you know...
The dawn of time.
Yeah, so the first type of problem it tries to solve is generating a Fibonacci series, which I’m sure we’ve all done.
Have we all done it?
I don’t know. Yeah. That’s a good question. Have we?
Yeah, and I am, I’m sure you did Jason, right?
Yep. Yeah, I went and did it inside of Rust and-
Yeah. Of course you did.
Well that’s my whole goal for this, right? And-
I found that Rust is, I mean not just Rust, but computers are really fast, so-
By doing it, I don’t have an example of the slow way of doing it with recursion and building up a new stack frame every time. I just kind of skipped straight ahead and made an iterator sort of pattern of it where it becomes a greedy algorithm which we can touch on in just a minute, and because of that it becomes so fast that I had to go and start finding like the 150th Fibonacci in order to start even having Rust benchmark show something that was not zero nanoseconds for it to run.
And the number size is as big as I can get it inside of an unsigned 128 bit integer, so it’s really big. I don’t know how many digits it is. I’ll have to count that sometime, but it’s really big and I am impressed with computers, that they’re super fast. So when you all implemented it, like did you start with like the classic recursion style? Like is that what you were first exposed to?
I actually started doing it for a scrum counter thing, but someone else immediately wrote it much better than me but it was just so that we could estimate stories which meant I only needed to go to up 144. So, yes. I kind of did it the slow way and then optimized as I went.
I don’t remember the first time I did it, but I did it in an interview and it was actually on paper instead of a whiteboard strangely, but I did it. I just did it with a for loop I believe, I iterated. And then, you know, I did that pretty quickly and then they said, “Okay, now do it with recursion.” And I remember in the interview really struggling with recursion because I just hadn’t quite wrapped my head around it and I hadn’t done it in a long time, and I felt like I totally failed that interview but I actually ended up getting that job. So…
Nice, maybe you spoke your mind well enough, or it like, gave people the understanding of how you were working through the problem, because-
Hopefully that’s what they were looking for and not for you to solve Fibonacci as fast as possible.
Yeah, yeah. I was definitely speaking out loud and got pretty close and they were like, “Okay, yeah. I see where you’re going with this.” And one of the problems I had with it I think was I was trying to output the entire series instead of just the nth number.
I think that that gets into the greedy topic as far as I understand. So the book talks about greedy algorithms and it describes them as an algorithm that does what’s best at the moment and so if you don’t need, say the entire list of all Fibonaccis but if you just need the nth Fibonacci then just calculate that and don’t worry about the rest of the list of them, and that can benefit you by, I guess first space-wise. Like, you don’t have to go and hold all those numbers somewhere. So for calculating n Fibonacci you don’t need to consume n numbers of space but instead you only have to consume a single cell. So, yeah. A space complexity of 1 which is much better.
Oh, and I feel like who should explain for the listener who’s not familiar. A Fibonacci series is a series of numbers where each number is the sum of the previous two numbers. And one thing I learned from this book is I always thought it started with zero and one, but you can start with any two numbers.
You can, but traditionally it starts with zero.
Yeah, zero and one.
And one. What’s also interesting is if you take any two numbers besides zero and one and one, I think it’s actually after you get to 3, it’s like, 3/5 or 8/5 it’s actually divides out into the golden number.
Oh right, it starts, if you, what is it? If you divide the number by the previous number it starts approaching ….
The golden number?
Yeah there’s a way that you can use that that you can generate Fibonaccis crazy fast. Like it, up to a certain point, so if you instead of actually going and doing the addition, if you just do like a single multiply against it you can get it up to a really large number before you start to get inaccurate, and I know that-
That was like a Project Euler thing ‘cause you had to calculate some like, huge Fibonacci or something like, I think Fibonacci(1,000) or something. And so in order to do that you have to be fast. So...I will dig for that. See if I can find something on it, ‘cause I remember I was kind of blown away that you could just multiply up to a certain extent.
Yeah, take 55 and multiply by 1.61 and it’s essentially 84.
Hmm. Okay so I thought this section was pretty interesting talking about heuristics and it mentions the traveling salesman problem and how it’s NP hard and one way, one heuristic to solve it is nearest neighbor, and so that was kind of what I was mentioning when we were talking about heuristics last time is nearest neighbors. Just pick the nearest city and it’s going to be pretty good most of the time.
But that’s also example of a greedy algorithm.
Prior to recording this podcast I was planning on doing a planned tangent so this one’s not spontaneous.
But just to kind of elaborate a little bit more on the greedy algorithm section and specifically the heuristics portion that Adam just covered.
(laughs) Thanks, Adam!
(laughs) You’re welcome?
Adam’s just setting me up. He warmed up the stage now I gotta get up there and deliver and hopefully I do with this tidbit. So I was just you know, researching ‘cause this section of the book lists a few greedy algorithms that I was like, you know, can I go out and pick a few more to share with people so you all have a better sense of what they are? \n\nAnd so I thought a little bit about some of like the different things that I had done at university courses that were solvable using greedy algorithms and one of the ones that I found that I thought would be a good segue from some of the things covered in this chapter is around job scheduling, and we’ll probably touch on this a little bit later, but your CPU on your computer can technically only execute one piece of code at a time. Let’s assume that it’s like a single core CPU from 1991 or something. So it can only execute one piece of code at a time and the reality is that you as a programmer might be wanting to run code from different programs at the same time. So maybe you have your browser open and maybe you have like your text editor open, maybe you’ve got your spotify open and your CPU has to, very quickly, schedule those processes in such a fashion that you don’t notice the fact that it’s switching between which code from which process it’s running. \n\nAnd one of the ways that it does this uses greedy algorithms and by the way, the example I’m presenting now is specifically for like, scheduling processes on your CPU, but really this can apply to anything that’s like, “I have a list of things I need to do, like a job, and I know it’s going to start at this time and it’s going to end at this time. So I know how long it will take and I want to figure out what is the sequence in which I should do these jobs?” And greedy algorithms come into play here and essentially there’s a couple of different techniques for doing it. \n\nYou might do a greedy algorithm where the local optimization that you make is something like, you always pick the job that’s going to start first. So it’s just sorted… In the end you get all of your jobs sorted by, in ascending order by when they started. You might also pick when the job ends. So you pick all of the jobs that will end sooner earlier in the process. Or you might optimize by greedily choosing always the job that's going to be the shortest to complete. And one of the things about greedy algorithms is that although they are generally optimal, they’re not always optimal. So in some cases you might be in a situation where you’re scheduling like six different jobs and you can only do two of them at the same time and there’s some conflicts involved so you can’t do like, job A and job C at the same time. And if you’re picking a heuristic in a greedy implementation like, I’m gonna do the ones that start earliest first, you might actually end up in a situation where you're not completely allocating your job board as well as you should or things are taking longer than they need to to finish because the dependencies aren’t mapped out right. \n\nSo yeah, that was my little tangent, this like, general problem or area, if you’re interested in learning more about it, is called interval scheduling and you can see how it comes, its applications in like, scheduling jobs in your CPU or just kind like, more general software type applications where you know, you want to build an app where people can schedule something or like, organize a gantt chart or something like that. So yeah, that was my planned tangent of other information to share (laughs). That’s it. Fin. End of short notes.
This should be a segment.
Pull me off the stage now.
I’ve seen a scheduling problem like that before in an interview where they had like, you’re an actor and you have different movies that you can be in and each movie pays you a certain amount but they have a start date and an end date so you can't be in any movies that are overlapping during filming and you have to pick which set of movies will get you the most money.
Oh, that’s a really good problem. So yeah, if you’re listening and you’re somebody who’s out there doing job interviews or just solving interesting problems, good to know that if you get that kind of problem statement a greedy algorithm is a good first approach to take.
That’s really cool.
They’re all around us!
Just have to open your eyes.
There’s greed everywhere.
Okay. So the chapter concludes with two different ways to find the shortest path in a graph. There’s the Bellman-Ford and the Dijkstra's Algorithm.
Who’s going to explain those?
I could talk a little bit about them.
I went and implemented Bellman-Ford inside of Rust and I did not actually get around to doing Dijkstra’s, just timewise didn’t get to it, but I still want to do a comparison of them. But a basic explanation of the Bellman-Ford is that you pick some source node in a graph. So say you have a graph of seven nodes and they have edges that connect to each other that are directed, so like, a can go to b, but not necessarily back to a. And so yeah, it’s a directed graph. It doesn’t have to have, it can have cycles in it, that’s okay.
But it’s weighted.
Yes, so the edges are weighted. So kind of think traveling salesman problem or something where like distances or whatever you want to consider to be a weight is, but yeah.
Yes and the objective of it is given a certain source node, finding out what the shortest path to any other node within that graph. And the way that Bellman-Ford goes about doing that is you start at that source node and then you go and find the lengths for whatever other nodes that it’s attached to and you say like, “This is the length for those things”. You go and you memoize those values or you cache those values in some sort of look up and from there you progress onto yet another node and it doesn’t particularly matter the order. The book actually questions whether or not the order matters and it doesn’t, but you progress onto another node and from there you measure the distances between that as well as back from the original. \n\nSo like if you start at node a and it’s connected to b and b is connected to c, then you go and you take b’s current length that exists in your lookup table and add it to b’s weight to c and you wind up going and traversing all of the nodes, I guess it winds up being Big-O(n squared) is really the worst case or like, that’s the algorithmic complexity of it. And by doing that, as you traverse the tree, eventually you wind up in a case where you haven’t actually updated your memoization table and at the point you know that your table is what’s known as relaxed or it’s stable. And at that point you know that you have found all of the shortest paths from a given source node for that graph.
Yeah. I like this idea of relaxation-
Because you start off and assume the distance is just infinite, and then as soon as you find one you’re like, “Oh, well that’s better.” And any time you find a shorter path you start to relax, and it doesn’t really sound like relaxing to me, it sounds more like tightening, like you’re getting closer and closer to the real answer.
Yeah, I don’t know why the term is relaxing.
I think it’s more like it’s settling in.
Yeah, I guess if you’re relaxing on like a memory foam bed eventually the bed stops collapsing underneath of you, right? Like, once you’ve relaxed enough?
If not, there’s a problem.
Right, you gotta get yourself a new mattress. So yeah, that is the Bellman-Ford and I did go and look through his algorithm and there were a couple of things in there that I think could potentially be improved. So if you are on page 205 where it is actually iterating over the table, one of the things that he actually does in this table is he looks up all of the edges for a given, I guess I’ve been calling them nodes, so I’ll stick with that, but he calls them vertices I guess, or a vertex, but for a given vertex he goes and finds all of the edges for that, and that isn’t actually necessary I found when I was building out the algorithm. Rather like, if you just do it against all edges it continues to work even without doing that check so it’s kind of an unnecessarily traversal of the tree, or iteration of the tree.
So yeah, I was kind of surprised when looking at that but you don’t actually have to do it node by node. It seems weird but it actually works out just fine. And in all the examples I looked at between like Wikipedia’s example implementation, they actually don’t care which one you’re iterating from. The fact is that you’re traversing over all the nodes enough that it does just stabilize, or the numbers stay consistent. I was kind of mind-blown.
Something that he did say when he wrote these algorithms is that he wanted to write them for clarity and not for efficiency. So he assumed that people would find better ways of doing it and that you’d find things within this that wouldn’t necessarily be required but that it was easier for him to understand it this way. So he was hoping that by writing it that way it would be easier for everyone else to understand as well.
I will have to talk to Rob about that and see his thoughts. Following up with Bellman-Ford, another graph traversal, it’s got the same objective as going and finding the shortest path between a source node and any given other node.
Basically every other node that is reachable.
Yeah. So it finds the shortest path between some starting node and every other node. It gives you a lookup of that out of it. And Dijkstra happens to be a complexity of Big-O(log n) so n being the number of nodes or vertices on that graph which generally it can go and beat out Bellman-Ford. And the way that it does that is by kind of improving its heuristics or taking a different set of heuristics by always going and following that lowest weight vertices from the given node that you’re on and then once you have done that then you know that you have travelled that shortest path. The book describes it pretty well with some pictures to kind of-
Yeah. Let me see if I can explain it a little differently. So-
For each iteration you visit the vertice that has the smallest value that is unvisited.
Yeah, and you, so you have to keep track of which ones are visited and you only visit each one once.
And kind of like how it comes up with that is it realizes if you’re going between points a and z and along the way you visited b, if you found the shortest path between a and z and includes b, then… Wow, I just lost my train of thought.
It basically travels to the shortest node and then travels to every single one if its nodes to say, “Okay, well if we start at this beginning and we go to this next node and it allows you to go to this next node, this is how much that next node’s going to have.” And then it does it for every other path it could possibly start with from that point. And then it goes backtracks once it knows it can’t go any further and starts again and goes, “Okay, so that was the lowest node, let’s go to the higher node and see where it can go and all the paths in which it can go to.” And it allows it to go through every single node once as opposed to having to visit them all, over and over and over again to see if every possible iteration is going to be faster than the previous computed iteration.