A book club for developers.
BookBytes is a fortnightly (or biweekly) book club for developers. Each episode the hosts discuss part of a book they've been reading. And they also chat with authors about their books. The books are about development, design, ethics, history, and soft skills. Sometimes there are tangents (also known as footnotes).
(Intro music: Electro Swing)
Hello, and welcome to BookBytes, a book club podcast for developers, and this week we’re talking about 2 more chapters of “The Imposter’s Handbook: A CS Primer for Self-Taught Programmers” by Rob Conery; and this week we’re going to go over, we’re going to finish up the chapter on data structures and then talk about simple algorithms. I’m Adam Garrett-Harris.
I’m Jen Luker.
I’m Safia Abd-
I’m Jason State- Oh!
I’m Safia Abdalla.
I’m Jason Staten.
All right, do we want to do another “Book News”? ‘Cause I’ve got one.
Let’s hear it!
Nice. I’ll have to check it out.
All right, so we need to finish data structures real quick.
What is a hash table?
I’m like, flipping pages, where is the hash table page?
Oh! You mean the one that I have marked as “Start”?
So, hash tables are a data structure that goes and indexes the values within it by going and hashing those values into a computed value that is consistent for all values put into it, and because everything is based upon its hash, going and adding to, and looking up items from a hash table are usually a Big-O(1), they’re pretty consistent so long as your hashing algorithm can produce unique results for the values that are passed into it.
Yeah. So, I think it’s pretty interesting what happens when there’s a collision. One option is you just like, move to the next available spot, but that could get really bad, or you could just use a link to list if there’s more than one value there.
Yeah, that’s definitely a challenging problem to deal with and hopefully you find a decent hashing algorithm so you don’t have to bump into that very much, ‘cause I’m sure perf goes way down as you wind up having lots and lots of collisions.
Yeah, I remember in college we had to try to write a hashing algorithm that didn’t have very many collisions and it was extremely hard, but I guess generally you’re not going to write your own.
Yeah. A lot of times libraries that you use will have, or sorry, some, whatever language you’re using will have a hash table built into it, especially if you’re using something like Ruby where hash tables are like, the core data structure that you use everywhere; and so, they’ve actually had times in the past where they’ve made pretty dramatic improvements to it and it just speeds up Ruby across the board because they’ve made hashing faster.
Cool. Well, unless anyone has something else about hash tables I think we’re going to skip over dictionaries and talk about trees.
What’s a tree, Jen?
Trees are a type of graph, essentially… It’s pretty much what you’re used to seeing regarding data. They have like, the bubble at the top and then they have the bubbles that are divided from down below. A tree kind of like a family history tree where you have you and then it breaks out and you have your parents, and then you have, each of them has their parents, you’ve got great-grandparents, all the way down. So, that’s technically a binary tree, but trees don’t necessarily have to have just two values on each side, they could actually have a really large set of limbs as you follow that tree.
Yeah, a family tree is kind of interesting, right? Because you’ve got, I guess it’s kind of upside down. Or you would have to start with yourself at the top of the tree-
And then your parents below you, and then their parents below each of them. Okay. I was thinking of it the other way around, like if your parents are above you that’s… That’s definitely a graph, but that would be a, I guess it’s also a tree.
It’s technically a heap (laughs), I guess.
Yeah, based on age?
So, I guess, we should… You said it’s a type of graph?
Yeah, that’s actually a note that’s mentioned later on in this chapter towards the end when they actually do cover graphs, and they say that all of these other data structures that we’ve covered throughout the chapter are all, also, forms of graphs. It’s kind of mind-blowing.
Yeah, so I think the key difference between a tree and a graph is that a tree is a graph that is acyclical, so it has no cycles, so there’s never a loop.
Yeah, it’s a hierarchical structure, which is linear.
I always found them funny that they called them trees, primarily because when you’re used to looking at trees you’re starting from the trunk and it goes up into the sky, but almost all definitions of digital trees are the trunk is in the air and then everything goes back down, kind of like roots instead of branches and leaves. Alas-
Yeah, I think of it as a, it’s kind of like a Christmas tree where you start at the star.
Same, same here.
That’s a good way to put it, but I’d never made that connection.
And then you’re going from ornament to ornament on the way down.
Well that’s deep.
I hadn’t thought about it as ornaments, but I did definitely have the Christmas tree vision in my mind. You just draw a little straight trunk underneath of it and put some present there and you’re good.
Or anything that you’re trying to transpile.
Yeah, I guess also with TypeScript, or linters.
Mm-hmm (affirmative). Speaking of which, I just released a linter last week.
You released a linter?
Mm-hmm (affirmative)- So, you know how we have eslint-plugin-JSX-a11y?
You didn’t know? Ah!
So, there- You know, eslint, though, right?
Well, eslint-plugin-JSX-a11y actually tests for accessibility while you’re coding. So, you can lint for accessibility requirements.
Yes, well it turns out that when we’re talking about React Native it kind of lands right in that space in between the web, and DOM, and React; and the ability to test apps in the native world. So, there’s accessibility tests for the web through like, axe-core or something to that effect, and there’s accessibility testing through iOS and Android, but there’s really nothing for React Native. So, Formidable and a great guy named Alex Saunders wrote JSX-plugin-react-native-a11y, and I released it as part of my chain react talk last week.
Oh, that’s awesome. Congrats!
That’s good, I’m sure that’s going to help a lot of people, and it’s great.
Nice work, Jen.
So, another type of tree is a trie. I think it’s pretty funny because it comes from the word “retrieval” so it technically should be pronounced “tree” as well, but then it would just be confusing, but you know it’s spelled, T-R-I-E.
You think it’s not confusing already?
Yeah, it’s pronounced “try”, it’s not spelled that way. Anyway, and it’s a tree, but it’s a tree that can help you... spell out words?
Isn’t that cool?
It’s an, it’s like a word completion.
Yes, so it can be used for when you’re typing into a search and you need to suggest what could come next. So, as you go down the tree it kind of narrows down what possible words you’re typing.
Also, it’s way cool that it shares parts of entries that are within that trie as well. So-
For example, the word “impart” and “imposters” both share I-M-P- and so, you don’t actually wind up duplicating those nodes, but instead they are shared for each of those entries, so tries are also space efficient as well.
Oh yeah. I think another thing this picture in the book doesn’t show, is where a word ends. So, for example, “imposters” could actually end at “imposter”, so I think one thing you can do is at the end of the word also add a node that’s just like some sort of indication, like a null node or something. Some sort of indication that that can be the end of a word.
Hmm, yeah. I can see the need for that if you had a word that is completely within another word.
Yeah, like you can’t assume that “impost” is a word, but “imposter” is.
Yeah, but I’m not entirely sure that you need to define the end of the word because the word itself ends. So, “imposters” may fall completely within that word, but you’re going to end up with “imposter” and “imposters” under the “imposter” parent. So-
Yeah, but it’s just hard to tell, is “Imposte”? Is “impost”? Is “impos”? “Impo”?
But as you go down that list, it’s going to limit based on what’s available.
Yeah, but if someone typed in i-m-p-, you wouldn’t want to suggest “impo” as a possible word, unless it’s really a word!
But no, there wouldn’t be anything under “impo”.
But how do you know you can suggest “imposter” and not “imposters”?
Because “imposter” exists and “impo” doesn’t.
But how do you know it exists?
Because it’s part, one of the children.
It, I mean, “impo” exists, i-m-p-o-...
I think you probably could do some sort of like, suffix check. So, like “imposter” and “imposters” are two words and they both have the same like, “Imposte” root, or I guess “impost” depending how you look at it, but like, what is the root of the word? “Impo”, like there’s no valid root for it, so it might be that or maybe like, each path on the tree is ranked, so, you know, it’s more likely that from “o” you want to go out to another node than terminate at “o”.
I mean, if you’re looking at that picture, what you’re looking at is that you have “imp” and then you have it break out into “impart” and it separates into the “o” but it completes it with “osters”. So, you’d have “osters” or it would go all the way down to “oster” and then you’d have a side node that would have an “s”. So, it would become a sibling to the “r”. You end up dividing it, but because of the fact that “impo” isn’t a child under the “o” letter there, it’s not one of the options.
Okay, so I’m looking at, actually a different book I used when reading this book. It was another book I used as a reference, and it’s called “Cracking the Coding Interview”, and so, when it goes over tries-
Add a null node?
It has little, it has little nodes, yeah. It says there’s null nodes to indicate complete words and it makes them on the graph with an asterisk. So, for instance it’s got “man” and “many” and so after “n” you have a “y” and an asterisk, and then after “y” you have an asterisk. So, yeah, I feel like that’s just something this book kind of glossed over, because it’s just not going into that much detail.
This discussion definitely makes me want to go and implement it, just to see what approach I would have to take to make sure I’m showing only entries that are there, and not every partial entry. So, I could take that on for you, Jen, to see what I can find from that side of things, as a little challenge to myself.
Sure. It’d be interesting. I mean, I’ve used auto-completions, I’ve written auto-completions, but…
Yeah, and I may not have entirely understood what you were trying to explain with just audio only.
Yeah. It’s just anything that came off of “o” would have its own, it would be like all the children that came off of “o”, but because there’s none that just end at “o”, I guess, but that would be part of your definition is that there’s no asterisk all by itself, as a child, saying “impo” is a possible word.
Yeah. All right! Should we move onto algorithms?
So, actually, we’ve got a sponsor first.
Yeah. (laughs) So, this week’s sponsor is V School.
V School was established in 2013 and is Utah’s highest ranked coding boot camp, and it’s the first of its kind in Utah. You can choose to learn full-time or part-time in the evenings, and you’ll immerse yourself learning new skills. So, even before classes start you’ll get a career counselor to help you find a job when you graduate, and they take care of everything so you can focus on learning, including free housing if you need it, and the housing’s just a few blocks from school. It’s located in beautiful Salt Lake City, Utah. Or you can learn from the comfort of your own home with V School’s 100% virtual classroom. You’ll get a super transcript, which is kind of like a portfolio, transcript, and letter of recommendation all in one, and it will help a potential employer quickly understand your strengths and abilities, and what you’ve accomplished.
That sounds pretty cool.
If you’re interesting in V School, head on over to https://vschool.io/ and it will help support the show, and definitely tell them that you heard about it from BookBytes. Thanks to V School for supporting the show.
So, simple algorithms. So, Jason and I were talking about this and our algorithm for going through these chapters was O(n), well, actually (mn) where m is the number of hosts and n is the length of the chapter, and so, we wanted to speed that up by each of us just picking our favorite algorithm, and so that should make it O(1).
I, uh, definitely agree with that. So-
(laughs) The nerd part or the O(1) part? ‘Cause-
Oh, I’m not-
That was way cooler.
I’m not gonna back track from either.
I’ve definitely been geeking out with this book, I’ve really enjoyed it so far.
Yeah, it’s been great.
So, speaking of geeking out on things then, I actually will jump in line and I will talk about my sort that I wanted to do. So, first, kind of going back to the last chapter related to a data structure that we didn’t spend a lot of time on, but it is a tree, and it’s called a heap; and a heap is a special kind of tree where a parent either has a higher or lower, they call it priority, but you can think of it as a value, than its sibling nodes, but those sibling nodes themselves don’t have to have any specific order to them. So, for example, if you were to have a heap of 3 nodes, with the highest priority at the top, say you had 3, 5, and 7 as your values within it, then 7 could be at the top, and then 5 could be on the left side of that, and 3 could be on the side. The order of 3 and 5 doesn’t particularly matter.
And the value in that is you can insert and move things around within a tree pretty quickly, because as we talked before, a tree winds up becoming a divide-and-conquer type strategy when you’re interacting with it where you only need to figure out which level something belongs in rather than having to figure out the exact position of something within a tree when you’re inserting, or removing, and rearranging in a tree.
So, first a heap has the value or use case of being useful for a prioritized queue, and a prioritized queue would be something you would want to use in the case of, say you have a RabbitMQ type worker process where you have a queue of things that you’re putting a whole bunch of tasks into and sometimes tasks will have higher priorities than others. By using a heap you can insert into them very quickly with a, it is a Log(n) type insertion time to put something into that, and then actually pulling off of the top of one winds up being a Big-O(1) operation to go and remove the item because it’s exact where it’s at, and then a log(n) type operation to go and rebalance the tree again.
But heaps also have the benefit of being useful for sorting as well, and the way that you can do that is by converting your list structure, oftentimes it’s an array, and reorganizing the items into a complete heap structure, and a complete heap is the one that happens to have the greatest or lowest priority at the root node, or the first node on it, and then your second level of the tree and third level of the tree are just denoted purely by the indexes within that array.
So, kind of a cool thing about it is a heap can be represented by an array-type structure and you don’t actually have to have individual node-type structures that point to other structures. So, it makes it really memory efficient because it’s just a contiguous set of memory, and because of that, going and doing a heapsort winds up being an (n log n) type Big-O, and I just thought it was pretty awesome. I went and actually implemented it today in Rust and it is, by far, the fastest algorithm that I went ahead and implemented. I did bubble sort insertion, merge selection, and heap, and heap, I mean because of its (n log n)-type speed, it outdid everything that I wrote, but the built in sorts are still definitely faster in terms of actual runtime. Their Big-Os are still the same as heaps being (n log n), but heap is definitely on top, or it’s top of the heap of my source that I wrote.
So, normally a heap, like you said, a min heap or a max heap would be represented with a tree, and it looks like a tree, like, when I see a picture of it in the book here.
And you said you can actually do it as an array-
Where you just specify what level of the tree each item would be at, if it were in a tree, and that’s great because its contiguous piece of memory. Do you do that for a sort because you already know how many items are going to be in the heap? And like, normally if you were just building the tree, ad hoc, you would know how many items are in there?
So, it’s not so much about knowing the size but the benefit could come from… First, if you are able to back it with an array, a lot of times arrays are the type of thing that you want to be sorting. I mean, I generally don’t find myself sorting other types of data structures too often, and so by using a heapsort on it and just backing it by the original data source, then you actually don’t have to go and allocate other memory in order to do the sort, whereas some other sorts actually require you to go and copy the original data source in order to rearrange it.
So, it’s in place and doesn’t require additional memory allocation which is good for like, low memory type devices and stuff. And the way that you can go and specify what level something is at is by looking at its index within the backing array. So, for example, index(0) is the root node, and there’s only one node at the root node level, and then indexes 1 and 2 are nodes that are at the second level, and then indexes, I guess 3 through… Oh, it got me multiplying it. (laughs) But uh…
Yeah, so like, everything under there, you have 4 nodes under there ‘cause every layer is 2 to the whatever level that it’s at, and it’s just known by the index that it’s at. So you’re not even keeping track of “this is the level that thing’s at” other than relying on the existing index that it is in the array.
And there is a really cool blog post about it, about “Heapify All The Things” and it is written by Vaidehi Joshi on Medium, and I will go and share a link that you can check out there, because it is an illustrated guide to doing a heapsort and I found it as an excellent resource to go along with the book.
Awesome, yeah. It seems like an illustrated guide would go great with the book. All right, Safia, you want to tell us about merge sort?
Sure. So, I decided to choose merge sort because it was related to a sorting algorithm that I brought up in one of my tangents during the last podcast episode which was Timsort.
It’s not a tangent, it’s a side note.
It’s a side note, of- Oh! That’s right, we’re being punny here, perfect.
One of my “side notes” during the last episode. So, merge sort, although it’s used in Timsort which we mentioned last episode, is actually a really old sorting algorithm. It was invented in the 1940s by somebody who was mentioned in the book a little earlier, Von Neumann who was tinkering a lot with computers in the ‘40s, as many of us do now I guess, and it was really interesting about merge sort was it applied the principle of divide-and-conquer, and this was also something that the book touched on in the last chapter. I think the, you know, in general, divide-and-conquer for solving any kind of problem tends to be like, a pretty common tool across computing, and merge sort was like, one of the first applications of that. It’s been around for 80 years and it’s pretty reliable.
So, with all of that background aside, what does merge sort actually do? Merge sort, essentially, works by dividing the lists that you want to sort into smaller and smaller sublists and sorting on those. So, the idea is that instead of taking the time to sort a list of 20 items, you sort the 10 lists of 2 items contained within that list and then you build back up. And this actually ends up being a really efficient technique. We mentioned in the Big O chapter, we learned about how, you know, generally when you have these sort of like, divide-and-conquer type techniques you get really good runtimes and the runtimes usually tend to be a log(n) because you’re always taking half of what you were looking through before, generally, and that’s the case with this.
So, the efficiency, or the time complexity of the algorithm is O(log n)-, O(n log n), and Jason mentioned a little bit about how much space heapsort takes up, merge sort takes up, I guess a space complexity of O(n) which basically means that if you have a list of 20 items and with merge sort you’re breaking it up into smaller lists that you want to sort through, then you would need basically the same amount of space as the number of items in your list for all of those sublists that you’re working through.
Is that in addition to the list that you already have saved?
So, O(n) extra space.
Yeah, one of the things that I think is cool to note in one of the visualizations that is on page 145 if you’re looking at PD,F is you can kind of see how much merge sort looks like a binary tree, ‘cause you’ve got those successive splits, and it was touched on in the last chapter, you know a binary tree is basically a tree where every node is either a terminating node or has 2 children.
So, I thought that was a really cool way to showcase those analogous principals. Like, you know, a binary tree is a divide-and-conquer representation technique, I guess is the way you might say it, or data structure; and so is merge sort. That’s all I have to say about it. My biggest was that it’s like really old, as most things in computers are. It’s been around for a while and it’s pretty time tested and I think that’s what’s cool about algorithms. Once you’ve got something that’s space efficient and time efficient you can take it anywhere.
One thing that confused me initially about merge sorts was, okay, we’re breaking everything up and now we have all of these lists of a single value and then I’m merging them back together. Like, how does that make anything faster, because then when I have a list of 2, and then another list of 2, how do I- my initial thought was like, okay well, I have compare all of these things, and it turns out the efficiency comes from having those two lists of 2 that you’re assembling back together, those 2 lists are already sorted and so the only things that you need to look at are the things on the I guess the left side of the list, or the head of those lists and that is the order that you assemble them back. You don’t have to check each item in one list against all of the items in the other ones. You only have to look at the heads of the list and that’s where the efficiency can come from.
The trend is kind of that although you’re still sorting you’re always sorting elements that are more sorted than they were before so your list is getting less chaotic as the algorithm runs and you only have to compare those first things.
Yeah, Jason. I had the same thought as you. It seems very, and it doesn’t seem very efficient because that’s not how I would do it. I think he mentions in the book that the sort that’s most like how a human would do it is selection sort ‘cause you just kind of look for the smallest one, and then look for the next smallest one.
Yeah, and I think that’s- Oh, I’m going to go on a side note again, I will pause it. But yeah, it kind of goes back on… It was, ‘cause it was invented in the ‘40s, it was one of the first sorting algorithms that was designed to be written on a computer and specifically like, a Von Neumann machine, because as is noted, you need the additional memory to keep track of all of the sublists.
And then you also need like, your CPU to actually do the magic. So, it was like, definitely like, very designed to be run on a computer and not, definitely not, by a person, ‘cause I would not sort a deck of cards or something, like this.
(laughs) Yeah. Okay, so you said you need the additional memory. Jason, is the heapsort in place? Like, does it need additional memory?
So, heapsort does not need additional memory so that is one benefit to it. It also did come a bit later. I think it came in the mid or later ‘60s, so definitely after merge sort; and one difference between the 2 algorithms as well, even though they have the same runtime merge sort winds up being a stable sorting algorithm whereas a heapsort is not stable, meaning that if, say you have 2 duplicate items in terms of their priority or ordering, those may get swapped in a heapsort whereas a merge sort will always make sure that if something came first before the list was sorted, like one duplicate came first before the list was sorted, it will continue to remain before it after the list is sorted.
Ah, which doesn’t sound important if all you’re sorting is numbers, but if you’re sorting numbers that have other data associated with it, it could be a big deal.
Cool, Jen what was your algorithm you wanted to talk about?
Mine is quicksort.