- Amazon surged $200 (10%) afterhours yesterday too. That's 2 crazy days.
- Like react offers jsx allowing you to blend html in js, jss allows css style declarations in js.
- Gradle is another build tool similar to maven. Groovy DSL.
- Netflix onsite. Went well. Refreshing to be surrounded by only senior folk. Parts of the culture (freedom and responsibility) are only possible in this type of env. Should hear back on Monday. Took extensive notes, all in private drive. This role is almost the exact same as my role at SpaceX, but in Java.
- Got to see the family up in the bay before flight back to LA. The city was buzzing with niners energy for Sunday.
- Thoughtworks is a software consulting firm.
- ConcurrentHashMap is thread safe. You can have multiple modify at once. It's a bit slower.
Friday, January 31, 2020
Thursday, January 30, 2020
- Interesting take on pipenv dying: https://medium.com/telnyx-engineering/rip-pipenv-tried-too-hard-do-what-you-need-with-pip-tools-d500edc161d4. I have always used pip-tools directly, the conclusion of the article. Sometimes the pieces are easier to maintain than the wrapper.
- Tsla's afterhours chaos was met, opening @638.
- Barstool got valuated at 450m due to a casino deal of 163m, 36% stake.
- Filed a claim for the yahoo security breach class action settlement. #AHC00896336.
- Practice problems: https://leetcode.com/problems/contains-duplicate/. Simple tuning.
- Got the offer from Disney. Bob Iger is the CEO.
- Finalized the bay flight 5 hours before takeoff, 21 hours before interview time. Crunch. In Los Gatos now preparing.
- Caught up on the last 2 highscalability posts.
- Reviewed concurrency. Threading, multiprocessing, async. I feel very strong in this already, but wanted to review just in case.
- Read the netflix culture doc again to familiarize. It's applicable anywhere. Some good corporate stances in there.
- Thought back through sx-setuptools, atlassian admin, and the full monorepo/build setup I had in place at SpaceX. Decisions made in development, performance metrics, etc. Runbooks, internet shield, default static analysis, py3 migration, crossplatform. Everything is files. Depgraph. Change targeting upstream n=2. Artifact provenance. On-call, devops, production support, pagerduty, escalations. Sentry, sensu, splunk. Sprintplanning, 60 day cycling, user storymapping. All the standard facets of a ci team.
- Went through some heavier frontend stuff than usual: https://github.com/yangshun/front-end-interview-handbook/blob/master/questions/javascript-questions.md. Took all notes in drive.
Wednesday, January 29, 2020
- Remember: camelCase snake_case PascalCase kebab-case
- Wearables (airpods and watch) gave apple huge profits this quarter.
- Read some resources from jack regarding the founder of ghsmart: https://geoffsmart.com/. A few takeaways:
- Really probe into the future of the department. Get a lay of the land from someone who lives there. Big projects, big changes, big consolidations? You get (1) a better idea of compatibility and future job responsibilities and (2) a better way to cater your responses/skillsets to match. I could be better about this one.
- Prepare yourself. Know yourself. Think about what you want for your future. Have explicit decisions made regarding your own trajectory. This then bleeds into most aspects of the interview process. You have honest answers as gut instincts, rather than canned answers you've rehearsed. You come across more confident. You're able to articulate genuine reflections of yourself and your relationship to the employer's needs. I value this one, and have spent a lot of time practicing it already.
- Need to have metrics, scorecards. Need to have them as a leader to be objective/accurate. Need to ask about them as a contributor/candidate to be transparent/inspire.
- My master's degree 10 years ago involved a lot of AI, ML, stochastics, probabilistics, etc, but I haven't immersed myself deeply in those fields since. These areas are rapidly expanding within the professional world, and I'll focus some of my free time to keep up to date on such tech in the months to come.
- Submitted refs and accomplishments for citadel.
- HFT = high frequency trading. You can do it at a rate that enables skimming.
- Disney onsite in glendale.
- Met with Art at brews brothers after, got dave's hot chicken.
- Interesting thought about misophonia - hearing is the only primary sense you can't really shut off. If you experience aversion to touch, you can move away. For taste, you can spit it out. For smell, you can plug your nose or breathe through your mouth. For sight, you can close your eyes. For sound, you can plug your ears, but it's much less effective and isn't practical like the other options.
- TSLA surged over $650, over %10, in afterhours after the earnings report.
- Submitted formal references and accomplishments letter to citadel.
- Ping is not over TCP, it's ICMP (internet control message protocol). It doesn't have a port.
- If you have n items in an array, the number of unique binary search trees that can be created from it is called the catalan number. https://en.wikipedia.org/wiki/Catalan_number.
- Database normalization. There are 3 basic thresholds: first/second/third normal form.
- 1NF: primary keys are unique, each col stores info in its simplest form, no redundant columns.
- 2NF: all the other columns depend on the prim key.
- 3NF: all other columns depend on the prim key directly, not transitively.
Tuesday, January 28, 2020
- Did a bit of research on ghsmart. Downloaded and started reading Who, the book from the creator, to get an idea of the process.
- Read a bit more of The Man Who Solved the Market while doing laundry.
- Coordinated further with citadel for a few items. Acquired refs, drafted an accomplishments-onepager, scheduled comp call. Details noted in my private repo.
- Finished season 2 of the sinner. S1 was better, but 2 was still good. 3 premieres in a couple weeks.
- Looked up specific amazon roles in the bay for transfer.
- Authored a 2-page retrospective for last year, an annual review for self-employment.
- Instagram ads are getting crazy. Just counted, it's 4 followed posts and then 1 sponsored, repeated. 20%. 6 minutes in a 30 minute television slot. 1/5 of an entire website. The standard is just ridiculous nowadays. Go Wikipedia!
- AWS IoT = internet of things. Connecting devices to the cloud. The primary location for bay aws is palo alto.
Monday, January 27, 2020
- Python bin() and int(num, 2) (for base 2) to convert between binary numbers (represented as strings) and integer datatypes.
- f'{n:032b}' will format n (an integer) as a 32digit binary number with padded zeros.
- Coronavirus is affecting the markets.
- Talked with Citadel about next steps. Scope of influence ; room to impact. This really is my most important comparator moving forward. More than title/salary/sector.
- DS&A:
- https://leetcode.com/problems/reverse-bits/. Binary work.
- https://leetcode.com/problems/number-of-1-bits/. Same.
- https://leetcode.com/problems/number-of-islands. DFS through each known land cell, count groupings and return.
- https://leetcode.com/problems/happy-number/. Try something in a loop, detect cycle by keeping tracking of seen numbers in a set.
- https://leetcode.com/problems/count-primes/. Sieve of eratosthenes. Prime numbers up to n. Zero 0 and 1. Then zero out all multiples of every number from 2 to sqrt(n).
- Between all platforms (leetcode, hackerrank, geeksforgeeks, interviews themselves) I've done ~200 practice problems.
- FB coding interview II.
- Graded on 4 aspects of the DS&A questions. Problem solving. Communication. Converting algorithms and data structures to executable code. Validation (testing).
- Notes all in gdrive. 2 coding questions. Did both. I like that they were formatted in real-world applications, with a quick discussion before diving into the algorithm.
- I continue to be impressed with facebook's process, in direct relation to its closest interview cousin Google. The focus is multiheaded, from intuition to coding to behavior. Google is much more what, Facebook is much more what how why when where who.
- Cut and wrapped 32 homemade pumpkin peanut butter bars for the next month.
- Paid the last rent (maybe last rent ever?) for feb in hermosa. Gave Sherisse formal 30d notice.
- If a bonus is paid as a supplemental wage, it's taxed at the flat rate of 25%. If it's added onto the year-end paycheck, it's additional income and is taxed at whatever aggregate tax bracket that income is within (which is obviously worse).
Sunday, January 26, 2020
- Practice problems:
- https://leetcode.com/problems/intersection-of-two-linked-lists/. To find the FIRST COMMON element in two arrays, or two linked lists, etc, the most natural way is to loop over one and create a set (hash table) and then loop over the second until you find a match. This is O(m+n) time and O(n) space. You can improve by using two pointers. Iterate one by one, and then if you've reached the end of the list, move it to head of the other list. Conclude once you've reached the end of that one. If there's a match, you'll find the first one. Write it out to see how it makes sense.
- https://leetcode.com/problems/word-break-ii/. Harder, but cool. Just like the other word break, just keep track of a dp memo that tracks word combinations up to that index. Then only check that index out if the previous index is non-empty (words ended there, so we have more possibilities).
- https://leetcode.com/problems/find-peak-element/. Interesting. Just check the right side (half, so log) instead of both left and right comparisons each time. The solution is wrong for this one, assuming a sorted input.
- https://leetcode.com/problems/fraction-to-recurring-decimal. This problem is much harder than it seems. Basically recurse down and perform the division each time, like you would in handwritten longdivision, and carry the decimal place over by 1 each time. Memoize to detect if you've seen a cycle already.
- https://leetcode.com/problems/excel-sheet-column-number. Very elegant solution, oneliner. ord of the char, adjusted, then * 26 raised to the power of the digit.
- https://leetcode.com/problems/excel-sheet-column-title/. Little tougher, but still an easy problem.
- https://leetcode.com/problems/rotate-array. Easy oneliner. Another cool solution is reverse the string, then reverse the first k items, then reverse the last n-k items.
- https://leetcode.com/problems/largest-number. Cool problem. Remember if you want to do any custom sorting, like by digit value or ANYTHING else, you can define your own custom Comparator class with __lt__ defined, then pass it as a key to sorted().
- Scheduled haircut for wed.
- Remember heaps, min and max. Can create (heapify) in O(n). Check the top element (min/max) is O(1). Popping the top, or removing an element, or inserting an element are all the same O(logn), because you have to rebuild part of the heap after.
- Great bday for Katilin, they did an awesome job planning dinner/jeop/paintNsip.
- Watched the Bellator and UFC fights from last night.
- Kobe.
- Pro Bowl.
- Remember that [None]*5 in python will make all items mirror each other, since lists are mutable. To make individual pointers, just do [None for _ in range(5)].
Saturday, January 25, 2020
- Practice problems.
- https://leetcode.com/problems/sort-list/. You can do bubble in n^2 time and 1 space, or use extra space to create another data structure regular sort nlogn with n space. Doing nlogn time and 1 space requires a weird in-place mergesort.
- https://leetcode.com/problems/lru-cache/. Implemented my own lru cache with an eviction policy. Get() and Put() are both O(1) in time. Achieved with a deque (the keys) and a dict (the key-val mapping).
- https://leetcode.com/problems/evaluate-reverse-polish-notation. Cool, but easy. Iterate through and simply perform operations as they come. Trick: manage your own index. You could use a stack, but mine collapses values in place, making it more memory efficient.
- https://leetcode.com/problems/max-points-on-a-line. This one has a 16% acceptance rate, the lowest I've seen. Did it in about half an hour. Line math, mx + b, then iterate over and track points on that line. Lots of corner cases: divide by zero, high-precision math (using the Decimal library).
- Youtube creator awards. The diamond play buttons is for >10m subs, and there are about 500 channels total that have been given it. The red play button is for >50m. The red diamond is for >100, and only a couple channels have gotten this (pewdiepie and t-series).
- Confirmed disney onsite for next wed.
- Watched rosemary's baby for the first time. Maybe shocking 50 years ago, but not a great movie by today's standards.
Friday, January 24, 2020
- Flatbuffer is a memory-efficient protobuf alternative.
- Watched kingpin. Old, but great premise.
- You can connect a websocket directly to a message queue like kafka.
- Netflix responded to schedule an onsite next week. Looks like we can make Friday happen.
- Disney responded to schedule a phone interview, I said asap, and talked to senior software engineer later in the day.
- Amazon followup onsite interview for senior leveling today at the santa monica office.
- FB wanted to chat again on monday.
- Started watching the tv show sinner on the chicago flight, like it so far.
- Ads becoming harder to spot in google search results now. Just a black word "ad" where the favicon should be.
- 23andMe and DigitalOcean both laid off a substantial number. Surprising. I like both businesses.
- AB was arrested, bailed, and required to go through a mental health evaluation.
- Practice problems:
- https://leetcode.com/problems/lowest-common-ancestor-of-a-binary-search-tree/. LCA if it's a binary search tree.
- https://leetcode.com/problems/lowest-common-ancestor-of-a-binary-tree. LCA if it's any binary tree. The problem from yesterday. Recurse node by node. Exits: (1) return false if none, past leaf (2) return true if node is equal to node1 or node2. Propagate that True up, bc you know it's a child. Recurse down left and right. If ever left and right (or self) return True, that's you're answer.
- https://leetcode.com/problems/sliding-window-maximum/. My solution worked, slicing the window and taking the max. This beat 20% time and 60% mem on leetcode. I actually got the optimizations right in the interview too, using deque, and only iterating through if the current_max is the one leaving on the left. This solution beat 33% and 24% on leetcode.
- Scheduled citadel followup for monday.
Thursday, January 23, 2020
- Practice problems.
- https://leetcode.com/problems/linked-list-cycle/. Linked list cycle. Add visited to hash set(), then check if present.
- https://leetcode.com/problems/majority-element/. Counter.most_common.
- https://leetcode.com/problems/maximum-product-subarray/. Kadane's for product instead of sum. You have to check min as well.
- https://leetcode.com/problems/reverse-linked-list. Reverse linked list, both iteratively and recursively.
- https://leetcode.com/problems/factorial-trailing-zeroes. Weird. Mathy.
- https://leetcode.com/problems/house-robber/. Just recurse 2 ago, add current, take max of that and the one previous.
- Citadel Interview in Chicago, went well. Took extensive notes in drive. Flew back to LAX at night.
- The US market is 50% of the global market!
- TSLA 600 call for 1/24 (tomorrow) was looking so good until now.
Wednesday, January 22, 2020
- Amazon is making a palm reader payment service.
- Placed some tsla calls, but in bigger news, the $1000 1yr is over $45. Ridiculous.
- Added KTN and FFN to the chicago flight.
- Popular hash functions: MD5, SHA, NTLM, LANMAN.
- FB coding interview. This guy was awesome. Did well, good convo.
- Great answer to my question, Future success in data or product? It’s neither. It’s having the best tech stack, the best engineering talent. This will make you more agile in bringing NEW products to market in the future, whatever that landscape may look like.
- Practice problems.
- https://leetcode.com/problems/word-break/. Did it very fast with my recursive solution, but need a way to memoize. DP is crucial for this one. Think about it logically. Try every word at every starting position, and mark the index of the end of the word as findable if there are any matches. I still like my solution better.
- https://leetcode.com/problems/min-stack/. Build your own stack.
- Max path length in tree. This is the same as max path sum (https://leetcode.com/problems/binary-tree-maximum-path-sum/), but treat each val as 1 (unweighted, basically). It's a recursion problem. Look to the left, recurse to the max depth there. Look to the right, recurse to the max depth there. Add the two together. That's the max path for this root. Keep a global variable for the max found so far, and then on each recursion just check against it.
- https://leetcode.com/problems/gas-station/. This one was weirdly hard. There's such a wide variety in problems. I crush some "hard" ones, some "medium" ones are impossible. There's definitely a bit of luck in the selection of the questions interviewers ask you.
- https://leetcode.com/problems/longest-consecutive-sequence. Smartly keep track of visited nodes in sets. Little harder than it appears, but approach logically. The hard part of this one is the recognizing the complicated for/while solution is actually only n, not nested n^2, because you're checking existence in set.
- https://leetcode.com/problems/copy-list-with-random-pointer/. Deep copy a linked list, iterate through and keep track of a few things. A different type of problem. Can recurse like usual, and assign both pointers, with one tricky part: keep a memoization dict of the visited nodes. This is just a mapping of original node -> copied node.
- https://leetcode.com/problems/palindrome-partitioning/. This one is tough too. I'll have to do some easy/medium to get up for tomorrow. Remember that recursion is almost always a good answer, even for string/list problems. Iterate through the beginning till you hit some criteria, then recurse on the remainder. Keep track of all valid answers in a global variable, rather that directly in the recursive call scope.
- Flew to chicago for citadel.
- FB uses mercurial, not git.
- Scheduled the amazon sr followup for friday, 1-3.
- Citadel research.
- 10s of billions of assets under management.
- Ken Griffin founder, Peng Zhao current CEO.
- Reviewed my tradebot.
- Nasdaq for all tickers on the nyse.
- Yahoo-fin and robin-stocks for history/data.
- Backtrader for backtesting.
- Simple moving average (and a few other basic one) for strategies.
- Robin-stocks to wrap the rh api.
- Scikit-learn/numpy/pandas/matplotlib for analysis.
Tuesday, January 21, 2020
- System design, whiteboard and youtube, ~1hr each. Mostly from Naren's channel.
- Uber. Path planning. Websockets to keep gps locations updated every few seconds.
- Fandango. All booking sites are comparable (airbnb etc) where you find a match. Suggestions are an important piece here (netflix etc), use heuristics from previous content.
- Webcrawler. Probably the most different one. Follow links, basically just a gigantic 100b node graph. Use a bloom filter for the "in visited?" portion. Used for search engines, copyright police, more.
- Twitter. Social media, read-heavy. Precompute feeds on write for most users. For celebrities, read on call.
- Whatsapp. Any chat service. XMPP, or websockets as the persistent protocol to read-write with new messages. And session, for delivered/read receipts. Message queue for the actual text, before db.
- Google docs.
- Dropbox/gdrive/stash. Just remote file representation, upload/download/sync. History is the most important piece here. Store diffs, not copies. Chunk files so you can diff efficiently.
- Gdocs. Collaborative editing. Just like optimistic locks. Like the previous, chunk for differential syncing. Could be character by character, or batched in whole lines, etc. This is called operational transformation, everything is captured in an event (insert, delete, update...) and replayed for all other clients. Or you could just do raw diffs (patches) and try to apply, raising conflict if need be.
- Curated list of interesting stuff: https://github.com/sindresorhus/awesome.
- Watched the ted interview with jim simons: https://www.youtube.com/watch?v=U5kIdtMJGc8. Not a ton of info.
- Subscribed to https://eng.uber.com/.
- Went through a bit of https://github.com/yangshun/tech-interview-handbook.
- Spoke with fb to schedule the coding interview, asap today or tomorrow morning before flight. Talked about instagram and whatsapp team matching. She was actually chicago based and mentioned citadel!
- Later in the day we scheduled for tomorrow morning before the flight to Chicago.
- Quick call with citadel for Thurs prep. The importance of tech in finance, and how that's reflecting in recent numbers (vs companies married to legacy finance fundamentals).
- Practiced a few behavioral orations again.
- Run, core, sauna, stretch, meditate.
- Another amazon feedback survey. Good all around.
- Watched a few videos on vocal coaching, remembering that breath class I took at stanford and how powerful of a speaker the professor was. Some were more focused toward singing, some toward presenting. Do mouth/face/voice warmups, stretch, posture, confidence, smile; treat it like a conversation not a stage.
- XMPP is very similar in functionality to websockets. Persistent connection, peer-peer. XMPP is decentralized, can be distributed; websockets are centralized.
- TCP is more reliable than UDP. It establishes a connection/handshake. UDP just sends data. Faster, but could be lossy.
- The python logo is 2 snakes curled into a cross, just to represent the snake name.
- Python isnum, isalpha, isalnum.
- Problems:
- https://leetcode.com/problems/binary-tree-zigzag-level-order-traversal/. Same as the usual level-order traversal (do iteratively, keep children), but keep a boolean that you swap each time for the direction of level printout.
- https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree. Interesting new class of problem for me. Rather than converting a tree into a list via traversal, you're given the traversal lists and you have to create a possible tree (there are many valid ones).
- https://leetcode.com/problems/pascals-triangle/. Recursion. Add above two numbers.
- https://leetcode.com/problems/populating-next-right-pointers-in-each-node/. Cool problem, another level-order traversal, just a slight constant-mem twist.
- https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/. A bit harder. Still recurse the traversal lists to create the nodes, but you have to think a bit more deeply. Use the preorder array to grab the root, find the root in the inorder array to bisect it, then get the lengths of both halves. Use that to split the remainder of the preorder array. Then just recurse as usual.
- https://leetcode.com/problems/binary-tree-maximum-path-sum. Use a postorder traversal. Recurse left right, calculating the max for those child nodes, then calc the max for this node by simply adding its val.
- https://leetcode.com/problems/valid-palindrome. Easy. Just compare palindrome fwd and rev.
- https://leetcode.com/problems/word-ladder/. Interesting problem. Jumping through available words, only allowing one letter change at a time. It's a graph problem, but takes a minute to realize this. I had first leaned toward a Trie. Once you realized graph, just build it and run BFS for the shortest path.
- https://leetcode.com/problems/surrounded-regions/. Used DFS to find the unsurrounded nodes from the edge, then simply moved everything other than that to X. Uses nodes for this one (not paths), which I haven't done in a while. It's simpler.
- Reread https://github.com/donnemartin/system-design-primer. Great document.
- Remember to always popleft() with doing DFS or BFS. It's just for BFS, append to end. For DFS, appendleft().
Monday, January 20, 2020
- Practice problems.
- https://leetcode.com/problems/set-matrix-zeroes/. Solved. To be even more memory efficient, you can set the matrix values in-place to a known flag (None or something) rather than keep track or row and col sets to zero in the next iteration.
- https://leetcode.com/problems/sort-colors/. Easy to solve, not so easy to do in constant memory.
- https://leetcode.com/problems/minimum-window-substring/. A bit harder. Can iterate and store in dict, removing letters as necessary, like I did, or you can use 2 pointers. Right for expansion until you get all of T, then left for contraction to try to minimize it. Continue until it's not a valid substring, then move left forward one and repeat.
- https://leetcode.com/problems/decode-ways/. Cool problem. Simply recurse on the different options, one or two digits. Always remember to memoize. In this case, the DP cuts the time in half. I scored faster that 95% of submissions.
- https://leetcode.com/problems/binary-tree-inorder-traversal/. Traversing with recursion easy. To do iteratively, look at it like BFS/DFS. Manage a stack/queue and pop off appropriately according to inorder/preorder/postorder.
- https://leetcode.com/problems/validate-binary-search-tree/. When recursing and check left/right less/greater than, don't compare to the node's value. You have to propagate the parent's bounds, just add a lower and upper param. Compare to that, not node.val.
- For leetcode specifically, I've now finished the top 50 questions and the top 50 interview questions. Will finish the top 100 liked questions next (almost done), then continue through 51-150 for the top interview questions.
- https://leetcode.com/problems/symmetric-tree. Level-order stuff is a bit easier with iteration than recursion, similar to BFS.
- https://leetcode.com/problems/binary-tree-level-order-traversal/. BFS, but the simplified version. Just an iterative while loop with a list of children, then collect the vals.
- https://leetcode.com/problems/maximum-depth-of-binary-tree/. Basic BFS, just count the layer for max depth. You can do this recursively in a one-liner with:
- def foo(root): return 1 + max(foo(root.left), foo(root.right)) if root else 0
- Markets closed today for MLK holiday.
- Dinner last night with the snyders (+baby) and everyone in culver.
- Confirmed no supercontest late-pick-reminder email on saturday, as designed.
- Python 3 type hint with default:
- def sqrt(number: int = 0) -> float:
- Orated the 4 behavioral questions in prep.
- Run, push, sauna, meditate.
- Emailed Netflix to keep them in sync with the others.
- Roasted 5lbs of peanuts (450F 15min). Made 2.5lbs regular peanut butter and 2.5lbs pumpkin peanut butter.
- Reread https://github.com/donnemartin/system-design-primer and did a ton more practice system design questions, including whiteboard. Boggle, twitter.
Sunday, January 19, 2020
- Consolidated my Amazon LP document into the general questions they might ask you, since that has a superset of all the general categories for behavioral interview questions.
- Conf championships! Gonna be an all-red superbowl, niners v chiefs.
- Disposed/packed more kitchen/bathroom supplies.
- Practice problems.
- https://leetcode.com/problems/regular-expression-matching. Solved it myself using recursion, felt happy with it, then did final syntax from the DP solution.
- https://leetcode.com/problems/wildcard-matching/. Same type. A little easier.
- https://leetcode.com/problems/largest-rectangle-in-histogram/. Similar to other problems. Iterate through a list of heights, find max of something related. Use a stack.
- https://leetcode.com/problems/single-number. Easy.
- https://leetcode.com/problems/best-time-to-buy-and-sell-stock. Track buy and profit, not buy and sell (which seem like a natural intuition).
- https://leetcode.com/problems/best-time-to-buy-and-sell-stock-ii. Easier, imo. Just add all the rising edges.
- Drew a system design diagram for autotest on the whiteboard, now that I've gotten much better at scalable system design.
- Most of the primary nodes are present, there's just not much redundancy. Lots of single points of failure.
- Consistency > availability (although it doesn't really matter at low scale, mostly 1-1 nodes and a couple requests / second).
- Task manager / gateway = dispatch server. Did have a heartbeat for all downstream nodes, although it was http. Did not really serve as a heavy-burden load balancer, most test cells had a single control node - in use, or not in use.
- Master-slave for RDB should have been in place years ago so that vehicle/component teams could query their data at will.
- All the usual frills. Analytics, resource monitor, logs, app monitor, hydra/sync for parallel and rpc sync.
- Redis cache for control node. Faster read, metadata, hydra state, etc.
- Took picture on phone.
- Remember that every db (disk or mem, rdb or nosql, etc) has locks for write.
- Lowlevel, it's a message queue with an event loop. It's called the write ahead log, WAL.
- Timeouts, so that one can't deadlock a row.
- Pessimistic = true, natural lock ("i'm using this"). Optimistic = check if change when writing back, like VCS merge conflicts. Favors smaller faster changes.
- Behavioral.
- Wrote a couple examples for each of the 4 LPs Amazon said they'd check during the senior interview.
- Plugged my ehd into the ubuntu laptop for the first time. It could mount the FAT partition but not the ExFAT, obviously for NT.
- FAT = file allocation table.
- Ansible inventory. I had forgotten this word the other day. Playbooks run, inventory is the list of endpoint nodes.
- Went back over sx-setuptools and autotest.
- Watched countdown. Ending sucked, and was illogical, but overall the movie was better than I thought it would be.
- Europix uses a ton of mem. Was basically 4GB of ram and 4GB of swap, yikes. Even compared to other streaming sites, for all (movies, tv, and sports), which are about half that.
- If you have n + (n-1) + (n-2) + ... (1), instead of n * (n-1) * (n-2) * ... (1), which is factorial, it's called the triangular number or the binomial coefficient. It equals (n^2 + n)/2, but you can just take the highest order term and approximate it as n^2. It's much lower in complexity than n! factorial.
Saturday, January 18, 2020
- Practice problems.
- https://leetcode.com/problems/minimum-path-sum. Did BFS first. Works, but not as fast. Wrote a secondary recursive solution, with both backtracking and DP. Faster.
- https://leetcode.com/problems/4sum. Wrote a beautiful generic solution to the ksum problem. Recurse down to the 1sum case, not the 2sum! The exit condition is just the target being in the remaining nums!
- https://leetcode.com/problems/3sum-closest. Similar.
- https://leetcode.com/problems/next-permutation/. Dumb question. Too specific for an interview, and doesn't show much problemsolving skill.
- https://leetcode.com/problems/swap-nodes-in-pairs/. Recursive linked list swapping. Just define to a temp variable, perform the swap, the recurse to the next one. Could totally be solved iteratively.
- https://leetcode.com/problems/reverse-nodes-in-k-group/. Same thing, but instead of k=2 (swap every pair), swap every k. If you're swapping 2 nodes, each node needs 2 connections for each neighbor, so 4 connections total. Since the root head has no left, you're doing 3 connections on each iteration/recursion.
- https://leetcode.com/problems/sudoku-solver/. A literal solver for a given sudoku board. Just recurses, populating the board with a new (valid) number and passing it through, then exiting in the base case where there are no valid plays. Backtracking. Not DP. Takes about 40 lines.
- Remember {} is the set literal in python.
- Jack borrowed the BMW for a 7am ride with Teej.
- Run, pull, sauna, meditate, stretch.
- Watched the truth about emanuel, the 2013 movie which servant allegedly ripped off. It was decent until the ending. The similarity is the fake baby, but not much else. Overall I think Servant is distinct enough to survive.
- UFC 246, mcgregor cerrone.
- Got kabuli naan, my favorite.
- Dimensional is the company that Rob partners with for data / portfolio suggestions (regarding my folks' retirement).
Friday, January 17, 2020
- Remember the tool pycallgraph, useful in tandem with cprofile for getting lots of profile information about your module.
- NBC (Comcast) added another streaming service, Peacock.
- Only 5 companies have ever hit the trillion cap. Four comma club!
- Even in a workplace environment, put yourself in your colleagues' shoes. See things from your customer's perspective. Really helps.
- Placed Amazon fresh order last night.
- Finished servant season 1. Good, but left lots of questions and loose ends. Need season 2.
- New Eminem album.
- I love the integrations between the google calendar and gmail. Autocreation of flights, hotels, events, etc - fantastic.
- Really interesting:
- Got an unsolicited call from facebook today. Recruiter for senior positions in infra/scalability. Bay area based. Set up a coding interview for next week.
- Practice problems.
- https://leetcode.com/problems/combination-sum-iii. Backtracking. Define the recursive case, define some fail-fast mechanisms as exit conditions. Throw loops in if you need to check multiple starting points.
- https://leetcode.com/problems/climbing-stairs. Recursion, with dp for speed.
- https://leetcode.com/problems/plus-one. Reverse to LSB first, the iterate toward more significant digits, incrementing or carrying the one.
- https://leetcode.com/problems/merge-sorted-array/.
- https://leetcode.com/problems/subsets. Generate all subsets. Just iterate, and add the current number to all previous solutions. Exponential.
- https://leetcode.com/problems/word-search/. Seeing if a word exists in a 2d grid. Cool problem. I love graphs.
- https://leetcode.com/problems/unique-paths/. Recursion with DP and backtracking. Sum left and up. Just think about the recursive case, you'll get it!
- https://leetcode.com/problems/unique-paths-ii. Same, but with obstacles. A bit harder. Lots more corner cases to consider.
- https://leetcode.com/problems/sqrtx. Iterative, or binary search across range(0, maxint).
- https://leetcode.com/problems/count-and-say. Bottom-up iteration!
- https://leetcode.com/problems/valid-sudoku/. Writing a sudoku validator. Not hard.
- https://leetcode.com/problems/first-missing-positive/. Didn't solve this one, looked it up though. Very complex problem, but simple solution.
- The neuroscience behind mindfulness is clear. Breathe, smile, stand tall. All work.
- Python itertools.combinations/permutations.
- Google coding interview #2 today. Went well.
- Lunch with briley at wing ferno.
- Put new (2021) registration on the BMW.
- When doing DFS or BFS, you can keep track of the depth/level. Just store each node as (val, depth) instead of (val), then increment it within the (for neighbor in neighbors: queue.append()).
Thursday, January 16, 2020
- Citadel onsite confirmed, next Thursday. Choice of Chicago or New York. Jack said both offices were great. Chicago is a shorter flight, so we'll go there. Fulltime can change after, if need be. Fly out midday wed.
- Very quick (2 question) interviewer feedback form for Amazon.
- California would be the 5th largest country economy, after Us China Japan Germany.
- Lots more system design videos. I really do love this stuff.
- Alphabet hit 1 trillion market cap today.
- Probabilistic solutions for hash collisions.
- When counting or checking existence, the natural solution is a hash table.
- The solutions here: use multiple hash functions, then compare. This makes you robust to collisions.
- Count min sketch algorithm.
- Used for counting how many times an item appears in an array.
- Iterate over the array, use multiple different hash functions on each item, then calc the min of all the integer counts produced by them.
- Bloom filter.
- Used to check if a record exists already (think "username taken"...).
- As each record is created, run it through multiple hash functions and put entries in a bit array.
- As a new one comes it, run it through all hash functions. If there's already an entry for all, then you're 90% sure it's already taken.
- If one output doesn't exist, then you're 100% sure it is not already used, because you would have put an entry for it when the record was created.
- These typically use efficient hash functions and memory-efficient data structures.
- Allows you to do something like pass a bloom filter clientside so that a user can get immediate feedback in the browser if a name is taken. Just have to pass the K hash functions and a bit array of the possible outputs.
- Rate limiter design.
- Just think like github's "remaining-requests" to the api. Keep a db which has a mapping of user: tokens. When they're all used, reject requests.
- You'll usually need to store timestamps too, so you know how the usage has been spread over your particular window (last minute/hour/whatever). Just like a queue: [ts1, ts2, ts3...]. You can count them as desired. Sliding window.
- Disney finally got back to me, they'll schedule a phone interview for ASAP. Reminder the comp package is much less than the others: ~190k base, no equity, contract 18mo, half benefits, some disney perks like movie screenings. I'll see how the phone goes, then balance with my current burden of the others before committing to an onsite interview. They're glendale, which is a bit more convenient than the remotes.
- Nosql databases can be indexed for speed as well. It's usually a compound index of multiple keys mushed together. Remember that an index is just something you can query quickly because it's sorted.
- Sam Harris is so good: https://www.youtube.com/watch?v=LRm_H158qc0. Be grateful for things that haven't happened. Easy way to stay in the moment, impervious to negative emotion.
- Downloaded Waking Up. Will read after jim simons' bio.
- To create the floodfill feature (the paint bucket in mspaint), just use a graph traversal (either BFS or DFS) where the neighbors are all 4 nodes in the cardinal directions. If the value is the same, update to new color.
- Practice problems.
- Focused mostly on DS&A today. Tried a new website for diversity, geeksforgeeks.
- Good summary of everything: https://www.geeksforgeeks.org/must-do-coding-questions-for-companies-like-amazon-microsoft-adobe/.
- In general, geeksforgeeks has pretty good tutorials/descriptions. The text is better, the IDE/leaderboard/etc is worse.
- Poison bottles and rats to test. Each iteration, have the rat drink half the bottles. Reduces it to binary search basically, logn.
- Binary trees. Find max width, height, sum of columns, etc. Just do -1 for each left traversal, +1 for each right traversal. Keep in a global hash table or something. Then you can run whatever stats you want on it.
- Ended up switching back to leetcode. Geeksforgeeks has a bad interface. It requires you to supply your own input/output reader/write, rather than making it class or function based with prefilled templates.
- https://leetcode.com/problems/continuous-subarray-sum/. This one wasn't easy. Find modulo of cumsum, check to see if two are the same, which means the cumsum between them is divisible by the target.
- https://leetcode.com/problems/baseball-game. Easy.
- https://leetcode.com/problems/top-k-frequent-words. Easy. Remember you can heap for nlogk instead of nlogn.
- https://leetcode.com/problems/spiral-matrix. Walk 2d grid in spiral. Just define the 4 directions, loop through them. Take whole row, take first or last item in all rows, and reverse as necessary.
- https://leetcode.com/problems/median-of-two-sorted-arrays/. Didn't do it, just reread. This is a great article: https://medium.com/@hazemu/finding-the-median-of-2-sorted-arrays-in-logarithmic-time-1d3f2ecbeb46. The problem is way out of scope for an interview.
- https://leetcode.com/problems/longest-palindromic-substring. Did the "expand around all centers" way.
- https://leetcode.com/problems/trapping-rain-water/. Trapped rain water, iterate once from left and once from right. Calculate max and deltas, then compare and take min. That's the trapped water.
- https://leetcode.com/problems/rotate-image/. Rotate image. If you're allowed to use extra space, just fill in the new matrix. Write a couple down to see the pattern. New value is just (j, n-i-1). To do the rotation in place, go onion layer by onion layer and do 1, 2, 3, 4 = 1, 2, 3, 4.
- https://leetcode.com/problems/jump-game. Did DFS. There's a much cleaner solution, basically start at the end and check if you can reach this position, then iteration back through the whole range.
- https://leetcode.com/problems/merge-intervals. Good example of making assumptions and testing them out. Did it vocally. Would have clarified with interviewer if live, rather than breaking/fixing with testcases.
- Applying locks, mutexes, semaphores to a concurrent program in order to avoid race conditions: synchronization.
- Priority inversion: when a high priority tasks is waiting for a low priority task to finish. Priority inheritance: when the low priority tasks becomes high priority bc the next job in the queue is high priority, to make sure nothing takes it away.
- In python2, xrange was more memory efficient than range. It wouldn't create the full list, it would lazily generate them as you need. In python3, range is xrange.
- Python's heapq.heapify will take a list and make a heap in linear time (in place too!). There's also nlargest, nsmallest.
- pop(0) for first element, pop() for last.
- collections.Counter(arr).mostcommon(k) is very useful. It doesn't have a nested sort, after count it's arbitrary. To enforce alphabetical or something on the second level, just write it yourself.
Wednesday, January 15, 2020
- Letsencrypt sent out deprecation notices for ACMEv1. Starting June 1, only ACMEv2 standards will be accepted as sufficient certs. I'll have to upgrade the
jrcs/letsencrypt-nginx-proxy-companion
(running certbot) before then. - Smoked the 15lb dry-brined prime brisket. Garlic, mustard, paprika, pepper. Post oak and cherry. Honey.
- Ran out of 18" aluminum foil. For a big cut like brisket, it definitely leaks with 12". Ordered more.
- Plaid connects apps like venmo, robinhood to accounts like citi, boa. Visa just bought plaid for 5.3b.
- I get python weekly and founder weekly, but I'm subscribed to nosql and never get that newsletter?
- Airtable makes a kinda cool product. It's a database/spreadsheet combined. Standard backend but offers a full frontend gui like google sheets. This exists in many stacks, but it's usually 2 products.
- Kadane's algorithm for max some of contiguous subarray.
- Basically at index i, consider all the subarrays up to index i, starting from every index before it. Get the max sum of those. Go to the next index. It's just the previous answer plus the number at the new index, or the previous answer itself.
- local_maximum[i]= max(local_maximum[i-1], array[i] + local_maximum[i-1])
- Simple recursion problem.
- Rewrote some graph traversal algorithms. Raw BFS/DFS easy. For pre-order and post-order traversal (types of DFS), use recursion. Just add the node to `visited` before or after the recursive call. For level-order (BFS), use iteration.
- For python remember the distinction between the subprocess and multiprocessing modules. Subprocess is for calling other programs, running something else from the shell, etc. Multiproc is for adding concurrency to your python program.
- Boston is ~200 miles northeast of Manhattan.
- If you're alive, you can breathe. If you can break, you can speak :)
- Threading is better for IO-bound tasks. Multiprocessing better for CPU-bound tasks.
- Read quite a bit about py3's async capability. https://realpython.com/async-io-python/.
- Slightly different than the parallelism/concurrency of threading/multiprocessing. Its async features (like asyncio.sleep()) are self-aware of when they're idling, efficiently giving the event loop to another coroutine during that time.
- Think of a standard web service, able to handle multiple requests in parallel. Asyncio is that capability, but for your program. It's very different than multiprocessing, which has a given workload to do and wants to split it among many workers to make it complete faster.
- async def foo(): tells python to create a coroutine. Within that function, say await bar(), instruct the event loop to go work on something else until bar() returns.
- Websockets.
- These build on that exact same async infrastructure. Before py3's asyncio, there was eventlet and gevent. Very similar principles.
- You basically have a route that's decorated slightly different than a REST route, and it keeps a websocket connected to the client. Every time a new message appears, it will perform the logic in that route. Example: flask-socketio. Just @socketio instead of @app.route.
- You can also go low-level yourself and define an explicit `async def my_coroutine` with `await websocket.recv()`. The websockets module comes with a very basic server that can persist the connection. https://websockets.readthedocs.io/en/stable/intro.html.
- No matter what, this manifests itself eventually as a data structure in python. Think of it like a list, in memory, that an async service is continually appending to under the hood. Do whatever you want with it, it's just a regular old list to your python app.
- Capsaicin is spread by birds. You might think seeds from chili plants don't get spread around by animal scat because they don't want to eat them, and you'd be right; with the exception of birds, who don't have teeth and just swallow the seeds.
- Watched Shimmer Lake. Dwight!
- Worked again with javier, jack, and tara for the next rounds.
- Scheduled google for friday.
- More general prep. Checked linked profiles of Monday's Amazon interviews. Watched about 10 system design videos, each ~30min.
- Lots from this channel for system design: https://www.youtube.com/channel/UCn1XnDWhsLS5URXTi5wtFTA.
- Locks.
- Even if the transaction isn’t a single db statement (like a deposit) but has multiple steps (like a transfer), you can still lock. A transfer is a withdrawal + deposit + commission, for example. If ANY of those 3 steps fails, have it undo the whole chain.
- You can queue transactions such that if A has a lock while B tries, B will try again in a few seconds when the lock is freed. This is called retry logic. You can write it with TRY statements in SQL!
- The full queue of transactions to write is called a write-ahead log. Like a task manager built into the database.
- Most locks will only lock the scope that's being changed (only that row). Some locks will disable write but allow read during the change. Some locks will disable both.
- Pessimistic lock = the standard one you’re thinking of. Lock a row, change some data, release the lock. The safest way. Better for joint sets of data, many conflicts.
- Optimistic lock = check the timestamp / version / hash at the start of the transaction, and then again when you’re ready to write back. Only write back if they’re the some. It’s less restrictive, and allows multiple people to read, but it basically favors fasters changes. Slow changes get clogged. Better for disjoint sets of data, few conflicts.
- Locks can be distributed, just like the databases themselves. You can have a cluster of redis nodes (direct duplicates for backup), each storing the mutexes for writes to the primary cluster. microservicenodes -> lockmanager -> caches, just like client -> loadbalancer -> microservicenodes.
- Locks have timeouts to ensure some dead node doesn’t halt the system.
- Put timeouts on everything! Responses, db locks, everything.
- A single char usually takes up 1 byte, so a string of len 4 is the same as an int for a 32bit system. A full sentence, maybe 100bytes. A paragraph, maybe ~1KB. A large doc, maybe 1MB. Compression helps too.
- CAP = consistency availability and partition tolerance.
- P is how the system reacts when nodes can’t talk to each other. Examples: two shards drop a connection, master fails the replication to the slave....
- The whole point is that in the presence of partitions (distributed systems), you can only choose one of (consistency, availability).
- Master-slave is AP. You might have inconsistency, where a write replication might take longer than the next read.
- Many equal nodes is CP. They might handle different tables (vertical partitioning), but they back each other up. This is a distributed datastore. Banking. REST. Locks on everything.
- Cassandra has something called "replication factor" which is how many other nodes to split the backup of a node on.
- Distributed transactions.
- Have the microservice itself (or a separate orchestrator/coordinator) keep a record of all planned queries/writes in the chain of the request. It needs to regard all of them as a single transaction. If any of the individual reads/writes to various other dbs fails, it needs to rollback the entire thing. You might hear this called "3-phase-commit" (can commit, precommit, do commit).
- Consistent hash ring.
- Hash tables under the hood. You might think the keys are just assigned to registers, 0-2^32 or something. Then you hash the input and simply lookup by key. This would work, but if anything changed (like the # of total memory locations), you'd have to remap your table to the new modulo.
- Instead, just make the keys random numbers. Then during lookup, pick the first key that's greater than your hash output. If you reach the end, just take the first element. Then, since you're finding the closest neighbor instead of the exact key, it's robust to remapping.
- While 50k requests/sec to a website is very large, to be considered “very large” in the world of queries/sec you must go even higher. Large apps are over 1 million queries/sec.
- Caches.
- A global cache can be as big as terabytes for large websites.
- Cache latency should be <10ms.
- LRU is a very common cache eviction policy.
- Under the hood, a cache is just a message queue with an event loop that has a threadpool at its disposal (and access to ram).
- Fault tolerance in a cache is usually just regular snapshots to disk.
- Since caches are typically event-driven, you can also take the logs and just replay all the actions.
Tuesday, January 14, 2020
- Watched I Think You Should Leave with Tim Robinson. Some good sketches, some average.
- LSU national champs.
- Bought, trimmed, brined a $15 prime brisket.
- Since I have the tax/house loot sitting in my BOA, I'm temporarily eligible for preferred rewards. Not gonna get another card or anything, but there are some small perks like no ATM fees at non-BOA ATMs, etc.
- Paid the $53 parking ticket for the garage next to tower 12 last friday.
- HDFS = hadoop distributed file system. Part of the Apache Hadoop suite.
- Did a cloud product comparison review between the gigantic stacks of AWS vs Apache vs Azure.
- While I was working on the Netflix takehome assignment, the position was closed (over the weekend). Dunno if that meant filled or headcount-cancelled. The hiring manager emailed personally and said that she liked me and wanted to personally forward to another hiring manager. I looked through all the open engineering reqs in Los Gatos / Los Angeles and asked for the python runtime group.
- I really do like 23andMe.
- Forget privacy, murder charges, general data paranoia. Accountability is good. More data is good. And this data is about as fundamental as it gets.
- Leads to connections. Leads to drugs. Leads to wellbeing.
- It's a fantastic business model. You're paying them to give them data, which is their actual product.
- Average US credit score in 2019 was >700. Higher than I thought it would be.
- Google and FB building little towns of housing complexes near their headquarters to offer affordable housing (+ convenience, - work/life balance). https://medium.com/cxo-magazine/google-and-facebook-are-building-the-ultimate-perk-housing-3ec8ba3c4f6b.
- Current vegas money lines, niners and chiefs both favorites at -320.
- The most career interceptions is 81. There used to be a lot more. The most for active players is 35, Richard Sherman.
- Features like search autocompletion should have a response time <100ms.
- A trie (pronounced try) is a specific type of tree, where the nodes are characters.
- For the english alphabet with 26 letters, it's just a tree with:
- Root node = null.
- Each node can have up to 26 child nodes, one for each next letter.
- Traversing down any path spells a word, (the path being the return string).
- With these properties, you can look up a word quickly. O(l+n), where l is the length of the prefix that you're checking against, and n is the number of nodes underneath that trie subset (because you have to traverse all of them). Then you'd usually do a sort, for klogk more, where k is the length of the valid words you found.
- This is used for stuff like autocompletion / look-ahead text.
- Jeddah Tower will open this year and surpass Dubai's Burj Khalifa as the tallest building, but then the Dubai Creek Tower should open the year after that and take the crown back. It costs over $1b. It will be over 1km tall.
- Remember to consider a timescale. If a particular computation is expensive, but the user doesn’t need accuracy of the data down to a resolution of seconds/minutes, just compute it and cache it once an hour or something!
- Jeop GOAT day 4. Ken won!
- FTP = file transfer protocol.
- I didn't know LinkedIn was the original creator of Kafka; Apache is simply the steward nowadays.
- Wanted to subscribe to highscalability's RSS feed, but gmail doesn't offer native support for RSS (anymore). This is crazy. I don't want to have to create an account with a standard newsreader and use it as a middleman. I'll just check the blog manually, but it's a shame to have to do so in 2020.
- Bought a new food processor.
- Amazon called, basically said that hiring was a consensus, but they were considering bumping up from II to senior. They wanted to get more data points to confirm, so they requested a second, shorter, onsite interview with more senior developers.
- 2hrs instead of 6.
- Same format, half behavior and half technical for each hour. 1 would be another problemsolving question and 1 would be another system design question.
- The behavioral halves would be split among a specific 4 of the LPs: think big, invent and simplify, be right, hire the best.
- Overall, I feel good about this? It's more work, but it's the chance for an early promotion. 2hrs for a stressful onsite is easier than working for a promotion internally for a year. In the worst case, I hope there's no offer retraction for the lower level if the second onsite goes poorly, for some strange reason. In general, I'm happy to see that the interviewers recognized some good principles and suggested higher leveling without my solicitation - the exact opposite of SpaceX :). Moving forward with gratitude, no matter what happens.
- Emailed Citadel and Google to let them know so that we can accelerate those tracks to stay in sync.
Monday, January 13, 2020
- 6 more leetcode problems.
- Remember that sets are mutable in python, and therefore are not hashable.
- If you need to solve a problem with nested lists/sets, and you need to check if an inner list is already in the outer list, it's slow. It's much faster in a hash table (set or dict).
- To solve, make the outer and set and the inner a tuple. You can leave the inner a list for whenever you're modifying it, but then you can convert to tuple when checking/writing to the outer set.
- Amazon onsites. 6 1-hr interviews. Notes in drive. Enjoyed it.
- Jack mentioned that Citadel is still putting together the onsite panel, so it'll likely be next week not this week.
- ZYME up 4% and TSLA up 11% today (to $530, w t f). ZYME had a -15% dip right after open, lasted for about 30min. Not sure what caused it, some afterhours behavior, who knows.
- Collisions. Two writes at the same time. Databases, both relational and in-mem, have locking mechanisms. Under the hood, there is a mutex that will keep transactions from overwriting each other (if it’s an ACID database). If you have two REST calls trying to book an airbnb, the db will confirm one as first and return the write transaction; the other will fail. The service should then tell that to the user.
- Remember if your DB doesn't have native support for a date/datetime object (like redis), just convert to an integer via seconds-since-epoch.
Sunday, January 12, 2020
- The MD5 hashing algorithm resolves to 128bit values.
- CRUD create read update delete.
- Final prep for amazon.
- Few questions on glassdoor. Mostly behavioral, some algorithmic.
- Few questions on careercup. Technical.
- All the big hitters in my gdrive:
- technical: a&ds, aws, databases, debugging, general software, linux, speed this up, sicp, system design, web.
- behavioral: my questions, their questions, amazon leadership principles, logistics.
- Few leetcode problems:
- https://leetcode.com/problems/rectangle-overlap.
- https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array
- https://leetcode.com/problems/search-in-rotated-sorted-array.
- https://leetcode.com/problems/jump-game-ii.
- https://leetcode.com/problems/combination-sum/.
- Signed the NDA.
- Anything >= 1000 requests / second is usually a heavy load. For a smaller system, a few hundred can even be heavy. Google processes ~50k searches per second.
- Can we get a favicon on bigocheatsheet pleeeease. https://www.bigocheatsheet.com/.
- Last 2 divisional games, so conference championships will be Titans vs Chiefs and Niners vs Packers.
- For recursive problems, you can:
- Naive: Generate every single permutation/path and then run through and check each as valid/invalid.
- Backtracking: Try a value, verify, drop if wrong and move on to the next. Continue if valid.
- Powers of 2:
- 2^10 ~thousand
- 2^20 ~million
- 2^30 ~billion
- 2^40 ~trillion
- Read sequentially from:
- Disk at ~30 MB/s.
- SSD at ~1 GB/s.
- Memory at ~4 GB/s.
- Fantastic overview of system design: https://github.com/donnemartin/system-design-primer.
Saturday, January 11, 2020
- Traversing paginated APIs is usually easiest with `while resp.links.get('next'):`
- Finished the netflix take-home coding assignment and submitted. It was a bit larger time ask than I'd prefer for this stage in the interview process.
- There's a few good examples of raw stuff in here. Pagination traversal, greedy page counts, custom manual mocking.
- 2 divisional round games! Niners. Went to the pier to watch with the sbsc season finale.
- Tox can have issues finding output data from pytest-cov. Just add usedevelop=True or setenv=pythonpath={toxinidir}.
- type('', (), {})() for an empty object you can set attributes for, etc. I know, there should a single call like object(), but you can't write to that instantiation.
- Remember, classmethod is run on the class not the instance (class()). staticmethod doesn't get self passed, it can be run without an object.
- Always patch in the namespace of module you're working. This is true even if the function lives somewhere else! @patch('mypackage.mymoduleimtesting.myfunc').
- Hahaaaaaaa: https://www.youtube.com/watch?v=4Pr_E8Tz7RE. Amazing game show idea. https://en.wikipedia.org/wiki/Killer_Karaoke.
- Went through notes/career/software and split up some of the bigger docs: general software, web. They had smaller partitions that made more sense (dbs, etc). Consolidated all the books too: SICP, PIE, CTCI, DS&A.
- 30 incredible summary documents in total now.
- Grouped all behavioral questions into categories and then meshed them with an independent bank of my experiences/skills/projects/interests. Helped organize the responses, which should make them easier to bring to memory for the amazon leadership principles.
- Wrote a system design diagram for all the nodes in autotest, from memory. Was a good exercise to connect new design knowledge with old (semifamiliar) architectures.
- Watched another couple design videos on uber and tinyurl.
Friday, January 10, 2020
- Petty released the sbsc winnings, $700 for 1st place, $50 for 10th. End of season party this saturday.
- Path-planning is an np-complete problem. The traveling salesman is an np-complete problem. Dijkstra's algorithm alone is solvable in polynomial time.
- Traveling salesman is just visit every city in the country once in the shortest path. In other terms: visit every node in a graph once, where the edges can have different weights (like distances between cities).
- ! to run shell commands from ipython.
- Spoke with the google recruiter. Gave her my feedback from the other day, and combined with the interviewer's fb, decided we'll do a 2nd phone interview. This is good - I wanted another datapoint. We'll do it next tuesday.
- With py requests, can pass it in directly with ?key=value or via params={key: value}
- You should typically use the rel links for next and prev when dealing with a paginated API, rather than just looping over an integer range and building the URL yourself. Sometimes the url will change.
- Dentist and haircut.
- Interesting report on the (primarily) medical state of psychedelics, and the effect on market investments we'll see in years to come: https://www.reddit.com/r/investing/comments/emkhwt/2020_psychedelic_industry_insights_report/.
- Went through a bit of highscalability. It really is a great blog.
- Jitter can be a good thing. Sometimes it’s best to intentionally add a bit of entropy in. Helps avoid thundering herds.
- Spent most of today on the Netflix take-home assignment. Private github repo.
- breakpoint() is only available in py3.7 and beyond, it's still import pdb; pdb.set_trace() for all vers before then.
Thursday, January 9, 2020
- Lots of system design practice today.
- NoSQL vs SQL.
- NoSQL is useful when you always need all information about that row. SQL better for select x, NoSQL better for select *.
- Don't need NULLs. Just skip keys in the nosql json. In this way, the schema is basically flexible per entry.
- Good for analytics.
- NoSQL is more expensive for updates, and doesn't guarantee ACID.
- NoSQL is more expensive for reads. While you can grab a whole blob easily, getting only one col across the whole db is slow.
- JSON nesting is nowhere near as structured as the the relationships (FK, many-many, etc).
- Joins are weird. Not easy.
- Overall: if your data is always read/written in blocks, not individual cols, nosql is good.
- Key-value is just key-value. Document nosql can have structure and nesting within (JSON usually).
- Examples:
- Document nosql dbs: mongo, couch.
- Key-value nosql dbs: dynamo, redis.
- Wide col nosql dbs: cassandra.
- Graph nosql dbs: neo4j.
- 10m call with Jack to set up the onsite with citadel, a few notes:
- Will's team is interested, but there are a few others as well (equities...). Interview might be at one site, but fulltime might be at the other.
- You don't need to have expertise in finance, but you need to show interest. The projects of last year are great examples.
- ACID is a db transaction property for transactions. Atomicity, Consistency, Isolation, Durability. Financial institutions absolutely require this. Something like a metrics server wouldn't.
- Availability is another important metric, but isn't part of ACID.
- Can shard by ID, or by location, or whatever is the most convenient/divisive/common entry point.
- JOINs get expensive if they're cross-shard.
- Master-slave architecture. Master gets all the writes, slaves update to match master every X sec/min/hours/days, slaves get all the reads.
- If master goes down, one of the remaining slaves becomes master.
- Still dumb that its formal name is master/slave in 2020.
- Remember generators just return before they're finished executing. Could be 3 lines straight with yield 1, yield 2, yield 3, but it's usually a loop within the function. Just returns partial progress. Then you can iterate over the returns as they come, or call next(my_gen_func) to walk through manually.
- Confirmed and planned the logistics for the amazon onsite on monday.
- Need to print the NDA before then.
- Consistency is very important. Do you want mutexes on everything read/write such that the data returned is always exact? This is much slower and much bigger. How much data can afford to be out-of-date? How much time must pass before data is considered obsolete?
- If I scale a microservice horizontally, the machines need to talk to each other. RPC. If I scale vertically, it's all local still. IPC. RPC is slower, and more complicated to manage. But it's usually cheaper, and it's boundless, and it's more resilient to failure.
- Remember: event-driven/pubsub/messagequeue/bus designs are worse with consistency, but better with availability. You’re distributing events and aggregating results. If you need higher fidelity responses, go with more of a request/response (timeout) design. Slower, but more integrity in consistent data.
- Event-driven is easier to test, play, rollback, because it’s structured and sequenced. Kinda like react/redux.
- In-mem solutions use snapshots to get persistence, but they're obviously at a certain frequency and so you open risk for data lass.
- Postgres has nosql features as well, it's just used less commonly. You can store docs just like mongo.
- Amazon 1hr onsite coaching call, general to all onsites.
- Basic fundamentals. The one thing I usually have to remember is: Ask questions to clarify the problem. This especially applies to design questions, but can also apply to coding questions.
- Amazon 30m final call, specific to me.
- Schedule: (each 1hr)
- 2 technical: Problem solving, DS&A.
- 1 nontechnical with hiring manager. Customers, communication, dealing with adversity.
- 1 lunch. Culture. Ask questions. Relax.
- 1 technical: Logical and maintainable code.
- 1 technical: System design. Scalability. Operational performance.
- Each will have 2 leadership principle questions (behavioral) as well.
- General:
- Ask clarifying questions. Think out loud.
- Start with an easy solution. Brute force, whatever. Then expand. Take bite-sized bites.
- The interviewer is your coworker! If you start to panic, look at them like a collaborator.
- Always consider priority. A bank withdrawal is much more time-sensitive than an email notification. This could affect the ordering in the message queue, the rate limiter thresholds on various notes, the load balancing, etc.
- When realtime accurate data is cumbersome, you can trend or interpolate data, based on history, to give an estimate. This is true for something like the viewcount on popular youtube channels. Would take too much to show exactly, but is easy to show ballpark.
- Thrashing - when a cache is inserting/deleting too quickly.
- Going through this process of interview prep has been interesting. I feel like I've consumed what must equate to at least a handful of semesters in fundamental CS courses. This accessibility of information makes me wonder about the future of education. It's been maybe 6 weeks and $0, whereas the university equivalent would have been a year and $30k. Will our grandchildren get in-person degrees? I'm not so sure this model is viable in the future.
- Ordered more dry-erase markers for whiteboard practice at home. Arrive tomorrow.
- Netflix splits every piece of content into small atoms, stored with different permutations of codecs and resolutions. That's why viewing is seamless when your internet quality changes, or you select a new one; it will simply then fetch the right chunk instead of the whole movie again. The amount of chunks it preloads into the future is based on heuristics, by how much people typically jump during this movie. If a lot -> preload only a little. If a little -> preload a lot.
- Did a practice system design from scratch for instagram/twitter.
- It looks like neither grubhub nor yelp do group carts anymore? I remember loving this feature. You'd just send the link to anyone, they'd look through then menu and add, then be responsible for the finances individually when the order was sent.
- You could store images in a db as blobs (binary large objects), but it's almost always still best to put them on a file system elsewhere (CDN, s3, etc) and keep only a URL to the file in your db.
- A client-server relationship is always initiated by the client. Request -> response. The protocol for this is http.
- A peer-peer relationship can be initiated by either side. You can use a general socket (TCP) or a protocol specific to your use case. For chat (whatsapp, fb messenger, etc) the protocol is XMPP - extensible messaging and presence protocol.
- It’s ok to have gut instincts. It's even ok if they're wrong. State them, and then state that you’re going to analyze them now. Do the verification out loud. A raw, transparent thought process is worth a lot.
- Designing google search.
- First, just think about keeping a big dictionary. As people add sites to the internet, it adds words and phrases to the dictionary. You have a rough mapping of what is out there.
- Then think of it like a gigantic hash table, or an indexed db. Look for the word (or words) in the search, and return the direct results.
- That might not be all though - look for misspellings, common associations, and other relatives. This is where association algorithms come in. They're usually based on prior results. There are many other keys to consider: location, how recent, etc. There are also blacklists for known spam, risks, more.
- Then, you have a list of possible options. How do you rank them? What do you show at the top of the list? This is another ranking algorithm. Again, usually based on prior results.
- How do you manage the size of the dictionary? MapReduce. Basically shard the hash table by characters (since those are they keys that the user is typing), farm out to the all the appropriate nodes to their more-efficient subsearches, then combine the data back.
- ~1 in every 7 strings typed into the google search engine has never been typed before!
- Write-ahead logging is when the action is logged before it is performed. Think of it in the context of a database write operation - this handles the atomicity and durability principles (of ACID).
- Third day of jeop goat.
- Replication types. Consider a db mirror, or a master-slave.
- Snapshot replications. Vompletely copy the full data over, like a backup or a pgdump.
- Transactional replication. Replay the transaction on each slave.
- There are more, but don't worry a ton about them.
Wednesday, January 8, 2020
- Think about if a system needs to be read-heavy or write-heavy. Something like twitter is much heavier on read operations.
- An index is placed on a specific column in a table so that you can perform lookups on that column more quickly. It's not the primary key, although you can think of the PK as an index if you know the ID, because that's the search val. More generally, it allows an indexed column to be sorted efficiently on the backend, so that lookup is binary search (logn) instead of n. The downside: this extra data takes space.
- Think of it like a dictionary (an actual physical book with word). Indexing it would mean adding pages for each letter in the appropriate place, with little tabs that jut out. You can find your section much faster, but they take a little space.
- PUT will update the data in that spot or create it if it isn't there. It's idempotent. POST just passes data to the service. The service can do whatever it wants with it. It's not idempotent. It's more generic.
- Citadel interview got a callback for onsites. Call tomorrow for scheduling.
- YAML is a superset of JSON. Can do more, but is bigger. Looks better, too.
- Testing pyramid: unit -> integrated -> e2e.
- Netflix phone interview.
- Great convo. Took a one-pager of notes. Just phone, no coding. 1hr.
- Discussed the INTERNET SHIELD, market viability in the future of streaming, attrition, culture.
- Her team directly aligns with my automation tools / devops / autotest / sx-setuptools experience.
- Next steps: take-home assignment, build an app. I said I'd submit by Saturday.
- DHCP for dynamic IP allocation.
- Perf is a main linux profiler.
- "Computers perform tasks. Machines should be solving problems."
- Practice problems:
- Remember runbooks for ops/support. All stacks/apps should have a list of common tasks or inquiries, each with an explicit set of steps to respond with.
- Slight tweaks to my resume. I still like my abridged resume more, although nobody else does. It's just so concise: 20 bullet points to summarize my life so far.
- TSLA is absolutely skyrocketing. Hit ~500 today. Its market cap is now 85b, which the highest for any american car company in history. I bought 2 shares on margin just to have fun with it. The emotional roller coaster is worth it, regardless of final profit/loss (since I jumped in ON a surge woohoooooo).
- Remember margin trading on RH is interest-free up to 1k, then 5% after that.
- Watched a lot of this guy's channel on system design: https://www.youtube.com/channel/UCRPMAqdtSgd0Ipeef7iFsKw. I like em.
- Athletes say a lot of dumb stuff on social media. It seems easiest to critique them for poor decisionmaking/tact, but in reality it's probably closer to: giving a huge stage to people on a career path where stage presence is not a required skill.
- Jeop GOAT day twoooo.
- Google phone interview.
- Was 45m. The guy was cold, didn't really reply to anything or change tone. Accent was tough over the phone.
- Didn't introduce himself or the team, jumped straight into a coding question on the google doc.
- We did a 2 subsequence comparison question. I solved it, but still had a bad taste in my mouth.
- We didn't discuss previous experience, scaling services, managing projects, writing apps.
- Big turnoff for Google, when my interactions with all other folks from Amazon/Netflix/BMW/Citadel/GitLab/Disney have been very positive.
Subscribe to:
Posts (Atom)