Thursday, August 22, 2013

A better and easier work life



Many people are unhappy at work. It doesn't have to be that way; a few key behaviors and perspectives will make your work life better and easier.

  • Take pride in your work.
    • It's sad when you don't. Those of us who care notice.
    • If you can't bring yourself to care about your work, quit.
  • Fulfill your responsibilities.
    • Among other things, it hurts other people when you don't. Not just you.
  • Only sign up for work you are sure you can do.
    • If you feel obliged to sign up for things you can't do, quit.
  • Do not take responsibility for anything that is not in your purview or the responsibility of a direct report.
    • If you are held accountable for others' failures, quit.
  • Do not feel responsible for others' failure to do their work unless they report to you. You must be able to assume that other people will do their jobs.
    • If this is not practical in your role, quit.
  • Do not ever lie.
    • Lying is: intentionally creating or persisting in another person a view of the world that does not match reality. This conceptual error may be harmful to them, and you should not presume to know what will hurt others and what will not.
    • It's disgusting. You're not a four year old. (If you are a four year old, write me at pmcculler@gmail.com and we'll take it up offline).
    • A poorly considered work estimate is usually a lie.
    • If you feel obliged to lie, quit.
  • As a manager, your team is your work product.
    • Make them powerful.
    • You categorically cannot be successful when your team is not.
    • If you are not rewarded for just and only this, quit.
  • Tell people what you are doing and how it is going.
    • If you're not regular and explicit, you are misleading them.
  • Expect and reciprocate cooperation.
    • Cooperation is expected and necessary and should be natural and easy.
    • If cooperation is painful, quit.
  • Coordinate with your broad team to make sure all scenarios and processes actually work.
    • If no one cares, quit.
  • Do beat yourself up over failures, exactly until you get the message.
    • You are the only one who can do this effectively.
  • Get better at your work over time.
    • The straight odds of being in the top 1% of your field are 100:1. Do you work 100x harder than 1% of your colleagues? 50x harder than 2%?
  • Use your curiosity.
    • Things you don't know are always the ones that wind up hurting you.
    • ... and helping you.
  • Look for a better way.
    • The odds you're doing it the best way you can with the resources you have are miniscule.
    • If your manager insists that you do it badly anyway, quit.
  • Prompt for and accept feedback.
    • Most of the feedback you get will be clandestine, implied, or third hand. Find it.
    • If you never receive any positive feedback, quit.

You've noticed many "if... , quit." imperatives. Yes. You can be successful, and life doesn't have to be horrible. Work to improve your situation, but remember that you are in an environment that is also a self-perpetuating system. Not all environments will respond to your sincere, thoughtful, and persistent attempts to improve them. Ultimately, you can control yourself, not your environment, but you can and should look for better environments to inhabit, thrive in, and improve.

Friday, March 22, 2013

The 300x Challenge

Could you be 300x more productive?

Although the usual sources are apocryphal, the evidence is clear that individuals and teams vary in productivity over similar domains by 1,000% or more. Supposedly, Google has calculated an individual difference of up to 300x between average and maximum productivity in technologists (such as software engineers).

A 300x difference is worth a pause. This sort of difference would imply that a maximally productive software engineer (in the Google model) could accomplish in one year what an average engineer would require three centuries to complete. A gross oversimplification, but illustrative.

We should find out what causes it, as you can imagine what it would be worth to create productivity like that. Any large company should be willing to spend billions to broadly implement such a change. So what could cause this enormous difference? A real answer requires careful measurement, but there's some value to a conjecture and narrative to get us started looking in the direction of an answer.

I tend to think in terms of teams of engineers working together, so I'm going to switch the focus from the individual to the team, and apply the same productivity amplitude. This seems appropriate.

Here are my raw thoughts on the matter. A 300x team:

  1. Never builds anything they don't have to.
  2. Rapidly constructs working software.
  3. Avoids distraction.
  4. Changes direction without friction.
  5. Does not spend time resolving bugs.
  6. Does not redesign and rebuild software unless they have to, and they almost never have to.
  7. Perfectly distributes tasks to the most qualified team members.
  8. Ships software instead of sitting on it.
How could it be done?
  1. Never builds anything they don't have to.
    • Because they use third party software whenever possible.
        • Because they know what's available.
          • Because they invest time in learning.
          • Because they have a broad network of advisers.
        • Because they have the budget to buy software.
        • Because integrating third party software is easy by technical means and policy.
        • Because they delegate or contract out everything they don't strictly have to build.
    • Because they only work on the most important problems.
      • Because they have a crystal clear understanding of what's needed.
        • Because they understand the purpose of the product.
          • Because they understand the customer.
            • Because they have a perfect feedback loop.
              • Because they have an expert product manager.
      • Because they do not spend time on anything not strategically critical.
        • Because they understand the strategy.
          • Because there is a clearly articulated strategy.
            • Because there is a strategy.
              • Because there is an expert strategist.
  2. Rapidly constructs working software.
    • Because they adopt good designs quickly.
      • Because they are expert designers.
      • Because they get assistance from professionals and domain experts.
        • Because they networked.
        • Because it's facilitated.
          • Because there's a facilitator.
      • Because they understand the problem domain.
      • Because they draw on a deep pool of atomic concepts.
        • Because they have deep computer science knowledge.
    • Because they write working code quickly.
      • Because they are expert coders.
      • Because code once written is always known to work properly.
        • Because it's thoroughly, automatically tested.
          • Because they invest in testing.
          • Because the team includes testing specialists.
        • Because code reviews.
          • Because the team trusts itself.
    • Because integration is simple, quick, and painless.
      • Because components are not developed in isolation.
    • Because they have all the skills they need.
      • Because strong hiring.
      • Because investment in skill acquisition.
      • Because product design leans on existing strengths.
  3. Changes direction without friction.
    • Because they ignore sunk costs.
    • Because they can change or reset development environments instantly.
      • Because they have the most advanced dev and test environments available.
        • Because they invest in tools and environments.
    • Because they keep their eyes on the prize, not the intermediate terrain.
      • Because they measure themselves.
        • Because they have a target.
    • Because they can pick up new ideas quickly.
      • Because they have strong fundamentals.
      • Because they have training, coaching, and mentoring support.
        • Because strong management.
      • Because they teach themselves and each other.
        • Because strong hiring.
        • Because trust.
    • Because they quickly abandon dead ends.
      • Because they watch for them.
  4. Avoids distraction 
    • Because production problems are minimal.
      • Because specialists can deal with production problems.
        • Because there are few to no production problems.
          • Because the software works.
            • Because testing.
            • Because not taking shortcuts.
        • Because there are early warnings before problems become expensive to fix.
          • Because the software has built-in monitors.
          • Because they use they use the most advanced monitoring and alerting systems available.
        • Because it's trivial to deploy and un-deploy.
    • Because they focus on the problem domain.
      • Because office life is easy.
        • Because office politics are minimal.
          • Because disputes are resolved immediately.
            • Because facilitation.
          • Because strong hiring.
          • Because strong advocacy.
        • Because everything but excellence is flexible.
      • Because they organize their continuous experiences around efficiency.
    • Because they use absolutely minimum process.
      • Because unnecessary process is periodically stripped.
        • Because they classify process as necessary or overhead.
        • Because professionals need little process.
  5. Does not spend time resolving bugs.
    • Because there are no bugs.
      • Because they were caught in design.
        • Because design review.
          • Because collaborative design.
        • Because they invest in design.
      • Because they were caught in development.
        • Because collaborative development.
        • Because testing.
      • Because there's a minimum amount of software that could have bugs.
        • Because they built the minimum amount of software.
      • Because developers take every bug personally - but forgive themselves.
  6. Does not redesign and rebuild software.
    • Because they designed it to meet strategic goals.
    • Because meeting strategic goals usually requires new software.
    • Because the team prefers patching to rebuilding.
      • Because they know that rebuilding is usually more expensive.
      • Because they never forget anything.
        • Because communication.
        • Because written communication.
        • Because simplicity.
  7. Perfectly distributes each task to the most effective team member.
    • Because planning and adaptation.
    • Because the team knows its skill structure.
    • Because the team passes work items around without friction.
      • Because trust.
      • Because it's facilitated.
      • Because status is transparent.
        • Because trust.
  8. Ships software instead of sitting on it.
    • Because they release all the time.
      • Because it's almost dangerously easy to release.
      • Because they are confident it will work.
        • Because testing.
    • Because they care about customer happiness.
That's quite a list. Some of the items may be unachievable. Is it enough? Could an 'average' team become 300x more productive if all these were abundantly, lavishly true? Could a team of 4 developers, with a support network of product managers and associated staff, finish in one year what it would take Joe/Jane Average SDE and staff a thousand years?

Thursday, February 21, 2013

Fractal tries: the compression algorithm

This is a description of the algorithm for transforming patricia/radix tries into much smaller fractal tries.

Here is a patricia trie that encodes the words Cry, Car, Cart, Dry, Dart, Hart, Hi, Hit, Far, and Fart.


The algorithm runs in a single depth-first traversal. It creates a lookup table, then attempts to calculate a hash for each node in the tree. A node's hash exactly accounts for its entire subtree (it and all its descendants). Here are the first steps illustrated:

Now our table has a hash and reference for one subtree, [RY].

(There's an extra dashed red line here you should ignore.)
Graying out the finished parts and moving along, it continues to try to calculate a hash for C node by descending to its next child (node that while I never show more than two children on these diagrams, thus left/right, tries can have any number of nodes, so it's really 1st child, 2nd child, 3rd child, and so on). This leads it to the T leaf and then it ascends:
 Now it has stashed the whole AR:T subtree; while not shown, the hash includes the fact that there is a word termination at the AR node; this prevents non-terminating but otherwise similar subtrees from matching it later. The algorithm continues to complete the C node subtree, then moves on to D node:

Where at this point it encounters the first entire subtree match with a previously encountered subtree. It abandons the "repeat" subtree and moves the parent node's pointer to the tabled subtree, leaving it looking like this:
And we have pruned away a leaf of the trie without sacrificing any knowledge of its structure. Now we'll skip ahead again. We're pruned away an ART node and a T node, and then we have an opportunity to prune an entire subtree:
We find in our table that the AR:T subtree under F exactly matches the AR:T subtree under C, so we immediately redirect. We recurse to the top of the tree and drop the auxiliary table. Then the final trie looks like this:


This trie has the same word structure we started with, and no additional elements in memory. The original patricia trie has 22 letters in 14 nodes. The final trie has 13 letters in 9 nodes. For large tries, the space savings can become quite substantial, sometimes a 3x to 5x difference.

Tuesday, February 19, 2013

Squeezed tries: fractal compression for words


In this post I explain fractal tries, why I needed to design them, and how they work, with a link to rough prototype code.

Recently I was exploring the new art of culturomics (word crunching, really) and went to load Google word data sets into memory so I could screw around with them. Immediately this presented a problem, because there are a lot of words in Google's word lists. Example: they have 28,771 words that begin with Q. I want to do fancy letter magic, but it's hard to even fit these words into a reasonable amount of memory, much less organize and search them. So I did the obvious and built a trie.

For the sake of quick testing, I used a data set a little smaller than Google's, "Yet Another Word List", with a respectable 264,061 words including 2,457,521 letters. The obvious method of storing these words in memory is like so:



With that arrangement, all letters are represented for all words; it's a simple array of strings of letters. A straightforward, traditional trie arrangement improves on the situation. It uses 586,511 nodes with 1 letter each. Here's a chart that shows how tries store words; on the left is a complete trie for the word "car" alone, while on the right is a complete trie for the words "car, cat, cart, and dart" as in the example above:


In this data structure, all words share as much of their prefixes as they can. A word is indicated in the trie by either ending at the bottom, or having a checkmark/flag to that effect. With such a data structure, you can quickly check for the presence of a word, or reconstruct the whole word list if you choose to.

So there's certainly a savings in terms of total number of letters stored, 76.1%. Unfortunately, each node to node link requires memory, so we actually wind up losing over half a million bytes of RAM compared to just storing them in an array. On the positive side, many different operations such as searching are pretty quick on this data structure. But I needed to fit more words into less memory, so I moved on to another data structure, this time writing a patricia trie, aka radix trie. Here's a chart that shows how patricia tries store the same words used in the previous example:



This chart is not drawn to scale. ;)

In a patricia trie, 'lonely' nodes are minimized. Whenever possible, nodes in the tree that don't need to be separated are collapsed into single nodes with more than one letter in them. This means we store just as many letters as in a regular trie, but we use fewer nodes. In this case, the patricia trie uses the same 586,511 letters but stored in 325,845 nodes, so there is less overhead. We can actually start to see space savings at this point, but not very much.

It seems a little silly to say it this way, but it's truthful and important to note that the purpose of a data structure is to preserve the structure of data. Words are structured in such a way that they frequently share prefixes, so the trie preserves this structure while saving memory space by 'folding together' the beginnings of each word. Each individual word is still represented and recoverable because all the paths through the trie from the root end in a real word.

Words often start the same way (car, carb, carbuncle, carbuncles) but they also often end in predictable ways (preserve, reserve | rested, bested, feasted | least, beast). This highly repetitive structure lends itself to 'folding together' as well. That lead me to the data structure I built next. Since it depends on eliminating self-similar branches all across the trie, I'm calling it a fractal trie (arguably this is an unfractal trie, but whatever):


The first two tries look just like patricia tries, because those word sets do not share suffix structure. With the third fractal trie, we see the trie re-uses an existing node rather than branch out to a new one. The fourth trie shows this continued. But fractal tries can fold over entire suffix trees. For example:


This fractal trie makes extensive re-use of both prefixes and suffixes. The trick, if you like, is in creating such structures in a reasonable amount of time. After consulting with a friend, Ian Bone, I found a method that runs in O(n) linear time. (It's entirely possible that he actually told me how to do this, but I was too dense to realize it at the time, and had to struggle with it for another couple of hours anyway. Thank you Ian.)

When applied to the yawl word list, an implementation of fractal tries creates 65,333 nodes including a total of 145,271 letters. That's a 94% savings in letters stored over the flat string array, and even including the overhead for 65,333 nodes it means a substantial memory space savings when implemented in any reasonable language. (yeah... lots of popular languages are not reasonable in this sense.)

I expect someone else has created data structures like this, and at some point I'll track that down and post an update here with a name and references. In the meantime, I'm happy I designed a data structure that meets my needs. I slammed together a prototype in C#, which due to memory overhead is not a reasonable language for compressing with tries. In C++ it will be much more space efficient by default, there are a variety of optimizations available, and I'll clean up the API.