Thursday, August 17, 2017

Google and “The Memo”.

I’ve worked in tech for a long time. Almost 20 years now. Being a manager in tech is interesting in difficult. When you get to a certain level of complexity, cowboy coders and individual performance starts to take a second priority. Team work and the ability to work as a team becomes the primary way you solve problems at Amazon, Apple, Facebook, and Google scale. As a result, I've spent a considerable amount of time trying to understand how to get more productivity out of larger teams.

Which is why I find the Google memo that has “rocked” the tech world so interesting. I don’t want to discuss the memo. You can find all sorts of interesting analysis of the memo here, and here, and … oh just google it. I want to discuss the analysis and structure of making arguments, and why I am so disappointed in this world of click bait headlines.

Diversity and Inclusion at Pintrest
One model of reviewing argument, debate and convincing media is with the Toulmin model. The Toulmin model breaks an argument down into its claim, reasoning, evidence, and warrant. These are useful tools for dissecting the argument into meaningful chunks that can be looked at critically, dispassionately, and as objectively as possible. The google memo is 10 pages of something that can be readily broken down this way. It has some sub-arguments, and other interesting tidbits, but at a high-level, the claim is that forced diversity is unsustainable. The reasoning is that forcing diversity implies that everyone can do the same jobs, but not everyone has the same capabilities, and you will eventually exhaust the pool of "diverse" candidates that can perform the job. The evidence goes through a sea of anecdotes, topical scientific research, and interpretations that may or may not reflect the current state of the art in sociology and behavioral science. Visit those other links for that analysis. His warrant is that anything unsustainable is bad or at the very least needs to be accepted as unsustainable. If readers had broken down his argument this way, a valid conversation could be had.

None of the popular articles on this seemed to have viewed it that way. The headlines didn’t approach the memo that way. They focused on phrases like “de-emphasize empathy” and other phrases and keywords designed to make you react. They focused on how difficult it would be for this engineer to be on a team. They cast everything as hostile, but few of the first responses broke down the argument to discuss the argument. They only reacted to what they saw.  I can understand that, it went viral on social media faster than people can read and critically analyze it. Everyone was just tweeting and reacting. Even some scientists weighed in, using biased language and rather condescending statements. 

I disagree with the claim in the memo. I don’t see how the evidence backs up his claim, but I do see where he tried to make a valid argument. If the argument had opened up a discussion instead of creating a maelstrom, I think we could make some real progress on the situation of diversity. But James doesn’t see his own logical fallacies and the world is too busy yelling at him or cheering at him for the conversation to be effective.
Random chart showing how little we help each other online.

There are real issues with diversity, inclusion, and just plain being supportive that need to be addressed. This just doesn't seem to be the conversation to actually get us to address them.

Wednesday, July 26, 2017

Horrible Signal to Noise Ratio

I’ve kinda checked out from Facebook. The atmosphere there is too dysfunctional for me now days. I think that most of the of the interactions through social media, especially in 2017 are either grandstanding, extreme calls to (in)action, or fishing for compliments. The current conversations in social media reveals how sheltered and reactionary our online discourse has become.

I think that there is a trend of people feeling that social media gives them a platform to shout their opinions or troll others, but I don’t see much that actually delves into the true thoughts and feelings of anyone. It is a group of people shouting, but no one is actually listening anymore. Even the ones that listen, are trying to listen through the din of everyone shouting -- trying to find a signal that is useful.

The internet tried to fix this by creating places for people to congregate and argue, but just as having bars doesn’t prevent public intoxication, content communities like redit don’t prevent public argumentativeness. The sites create places where you know drunk people will be, but they don’t prevent drunk people from being any place else. One of the interesting aspects of the content communities is that they can have moderation, and can attempt to stay on topic, but that doesn’t eliminate the drift, or the impact of a bad moderator. When I was young, in the early 90s, I used to troll people on Third Age. For those unaware, third age refers to retirement. There isn’t really a place for a 14 year old in that community, but I was damned sure Ayn Rand knew nothing about society or life, and I posted counter arguments constantly. I eventually got banned. It wasn't until years later that I realized that trolling them WAS living Ayn Rand's Objectivism, which I was so adamantly against.

There is no oversight or editorial boards for people publishing on Facebook and Wikipedia, which allows them to post any and every thought they have. Sadly, there are many people who think that because it is “published” online then it must be true. Then you have the comments section on news articles and social media. I am not sure why any comments on anything anymore. They seem to devolve into screaming matches between extremes, even when the topic is fairly benign. Everyone seems to be focused on being unprofessional and active, rather than being interesting or humble.

In a perfect world, there would be true, objective media in online communication, but with the ease of publishing, I don’t think it is feasible. The other thing preventing this would be how much more everyone seems to be in a bubble of media that reflects them, and does not allow outside ideas to intrude.

Thursday, July 6, 2017

TMI and Sharing Things...

When my son was born, I sent an email to a select list of people announcing his birth, weight, and that wife and son were fine. Standard fare for the occasion. He graced the world around 4am Eastern time, and I felt calling anyone was beyond my energy level and of little value since the closest family member was 1,200 miles away.

When I spoke to my mother two days later, specifically calling her to ask why she hadn't responded to the email, she was upset with me for not calling her immediately. Apparently an email to a bunch of other people was beneath the status of our relationship. My mother (who I rarely speak to more than twice a decade) believed she deserved to receive better communication and access to me than the friend who had keys to my house and was watching my dog while we were at the hospital.

These relationship distinctions are interesting, but I feel like they are becoming a thing of the past. My younger friends, even close ones, are fine with "personal" announcements happening on Facebook, in public. Basically, everyone who is a Facebook friend finds out at the same time. If you are interested, great: engage, like, or share the post. If you aren't interested, great: ignore, unfriend, or engage as a troll. But since nearly everyone is treated as equals on the network, the author is absolved from having to make decisions on which friends were called first, and when; even better than what I did, since I had to choose the email address for the to: line.

Good friends help you
bury a body...Great
friends bring their own
shovel and don't ask
any questions...

When it is truly a private conversation, my inner circle of friends tend to share over Signal or Facebook Messenger, more peer to peer mechanisms for communicating. But those are for incredibly private conversations. The kind where you are asking a good friend for something that should never be shared, and there will never be more than 2 people total in the conversation. But the list of people that fall into this category is down to 2, all of which I have regular physical, face to face interactions with.

I think that if you are comfortable sharing something online, then you aren't sharing too much online. If you are uncomfortable sharing it online, or with what other's are sharing online, you might need to reflect on why you feel that way. It likely has to do with a belief that you have a better interpersonal relationship with someone, a better bond with that person, than you really do.

Tuesday, August 2, 2016

Knapsack and Go

I've been playing around with Go a lot the past year. I've done a couple of projects for pay, and a couple of projects for fun. I have been finding it an incredibly useful pocket language for solving almost any problem.

Recently, I spent some time researching the different solutions to the knapsack problem. After reading all about the knapsack problem on wikipedia, I implemented the bounded solutions in go. As a control, I used the item list for Nils Haldenwang's post about Genetic Algorithm vs. 0-1-KNAPSACK.

I started off with a recursive brute force approach, and kept evolving that approach until I had an iterative solution that used a channel for generating the set of combinations. It probably isn't the most efficient way to implement the set generation, but I still tend to throw channels and goroutines at any generator I see in code.

After I had the brute force approach, I optimized it a little bit by trimming out branches that would never be used. This resulted in about half the time required for the same dataset. But it actually doesn't change the worst case scenario much. It isn't so much of a solution as an optimization that makes it look a little more breadth first search. These ran in about 17 seconds for brute force, and 12 seconds for the optimized version.

Then I implemented the dynamic programming approach, which is just unbeatable speed wise. Didn't even register as a millisecond for the testing dataset. It took me a little bit to understand how to discover the list of items packed in the knapsack, but the total solution was still small enough to understand. I used Mike's Coderama to help me understand what was going on there.

Finally, I implemented the meet in the middle solution. This was actually a surprisingly faster solution than I expected. The code was able to reuse the parts I had done for the brute force solution, which made it fast to write. The simple solution was able to solve the 24 item problem in about 100ms. I played around with it a bit to optimize the best-case scenarios, and got it to about 40ms on average.

In the end, I like the meet in the middle solution the best. It is feasible to use the solution for all types of bounded knapsack problems where you have to use a float for the weight. I posted my go implementation of the bounded knapsack problem on gist.

Now, its time to play with the bin packing problem.

Tuesday, March 3, 2015

WatSON Composite Ingredients

Simple ingredients represent a single value, like a number or a string. My last post covered the basic details for simple ingredients. Composite ingredients are designed to contain other ingredients or multiple values. Some are used to change how things are written to the file, like the Compressed Ingredient. Others are designed to provide structure to the file like the Container and Map Ingredients.

For simplicity, I am listing only the 8-bit sizes for the Ingredients, but the structure is valid for any size type.

Byte Reduction Ingredients


The library and compressed ingredients are designed to reduce the number of bytes required by data stored in WatSON format.

Library ingredients contain strings that are used elsewhere in the document. This primarily applies to keys for the map ingredient, but also applies to bytes ingredients. Libraries use a zero based index (the first element is index zero), but the string at index zero is always an empty string. The empty string is because the index zero is reserved.. See the map and byte ingredients for more details.

Library scope is going to be mentioned in the description for byte and map ingredients. Right now I am defining that as the nearest library in a parent container, although scope should get its own dedicated post in the future.

<library-ingredient> ::= ‘L’ <8-bit-size> <empty-string-ingredient> <string-ingredient>*

Compressed ingredients contain a single ingredient that has been compressed. I toyed around with the idea of making them a container as well, but I felt the single ingredient child would make implementations more straightforward at the cost of a few extra bytes.

I am thinking of using Snappy for the compression method. I haven't decided how flexible that will be in the future.

<compressed-ingredient> ::= ‘Z’ <8-bit-size> <data>*

Structure Ingredients


Container and map ingredients are designed for nesting and providing structure a WatSON document.

A container contains other ingredients. It is used for nesting and grouping ingredients. Think of it as a vector or list. Nothing inside a WatSON document references positions in a container. Order will probably be important. especially with regard to library and header ingredients. 

<container-ingredient> ::= ‘C’ <8-bit-size> <ingredient>*

A map ingredient is a key value structure. The keys are 32 bit unsigned integers. Positive keys reference the in scope library. A key of zero is reserved for an empty string key. I think I will be using that key as optional metadata, but I haven’t thought through what and how that metadata will be used. That will probably get a dedicated post in the future.

<map-ingredient> ::= ‘M’ <8-bit-size> <map-data>*
<map-data> ::= <uint32> <ingredient>

Extension Ingredients


Binary and header ingredients change how a parser should interpret WatSON data that follows.

Binary ingredients store opaque binary data. A positive marshal hint is a reference into the in scope library. A marshal hint of zero is reserved for an undefined marshal hint. I am thinking of the marshal hint as a place to store the run-time type. WatSON doesn’t specify anything about the data.

<bytes-ingredient> ::= ‘B’ <8-bit-size> <marshall-hint> <data>*

Headers are string based maps that contain information about the file contents. They are optional ingredients, and used to document requirements for parsing the rest of the file. An example would be the character encoding for strings, although I am leaning towards utf-8 being mandatory. They can also be used to document schema information or metadata like the program that generated the file, etc.

<header-ingredient> ::= ‘H’ <size> <header-data>*
<header-data> ::= <c-string> <ingredient>

I like how the format is coming together. I have my incomplete reference implementation foundation, as rough as it is, checked in on github.

Thursday, February 26, 2015

WatSON Data Types.

In my last post, I mentioned what I was calling the type marker:

<Type-marker> ::= <size-type> <data-type>

Size
Type
Data
Type
b7 b6 b5 b4 b3 b2 b1 b0

That post was dedicated to the highest 2 bits of the Type-Marker; the two bits that represent the size-type. This post is dedicated to the lower 6 bits: the data type. An MP4 atom uses 4 bytes to represent the data type (atom name). The size makes sense given the large, dynamic ecosystem that the specification is trying to support. Interestingly, the convention is not to describe the atom names as 4 byte integers. They are almost always referred to by their ascii representation. For example, the "Movie" atom is "\x6D\x6F\x6F\x76", which, if treated as a character string is "moov" (pronounced "Moo-V").

I like the idea of numeric identifiers having useful printed representations, so I copied that concept into the type markers for WatSON. For example, the single byte types of null, false, and true will be represented as the following:

<empty-false-type> ::= '0' ;; st == 00, dt == 110000
<empty-true-type> ::= '1' ;; st == 00, dt == 110001
<empty-null-type> ::= '?' ;; st == 00, dt == 111111

This manages to combined the size type and the data type into single character type marker that matches the convention for the type. This only holds true in the most common representation. Someone could create a short false type with an 8-bit length, similar to the following.

<short-false-type> ::= 'p' ;; st == 01, dt == 110000

The character 'p' doesn't represent false for me, but I see no reason to create rules preventing that ingredient. Using 2 bytes to store false is a waste of space. but it should not break parsing.

Going with this expectation about common sizing, I selected the following values for the lower six bits:

<simple-data-type> ::= 0x30 ;; False type
  | 0x31 ;; True type
  | 0x3F ;; Null type
  | 0x24 ;; Float type
  | 0x29 ;; 32-bit signed integer type
  | 0x2C ;; 64-bit signed integer type
  | 0x35 ;; 64-bit unsigned integer type
  | 0x22 ;; Bit-flags type
  | 0x33 ;; String type
  | 0x08 ;; Header type
  | 0x0C ;; Library type
  | 0x03 ;; Container type
  | 0x1A ;; Compressed container.
  | 0x0D ;; Map type
  | 0x02 ;; User defined binary type

I break them into two categories: empty and short. I’ll start by repeating the empty ingredient types from above:

<false-ingredient> ::= '0'
<true-ingredient> ::= '1'
<null-ingredient> ::= '?'

Then I have the simple short ingredients:

<double-ingredient> ::= ‘f’ ‘\x0A' <8-bytes-data>
<32-bit-int-ingredient> ::= ‘i’ ‘\x06' <4-bytes-data>
<64-bit-int-ingredient> ::= ‘l’ ‘\x0A’ <8-bytes-data>
<64-bit-uint-ingredient> ::= ‘u’ ‘\x0A’ <8-bytes-data>
<bit-flags-ingredient> ::= ‘b’ <8-bit-size> <data>*
<string-ingredient> ::= ’s’ <8-bit-size> <data>*

Last, I have the composite ingredients. I went with short sizes on these, mostly because of the lack of diversity above 127. My hand crafted WatSON documents are all less than 256 bytes, so the short sizing may be biased or flawed.

<header-ingredient> ::= ‘H’ <8-bit-size> <data>*
<library-ingredient> ::= ‘L’ <8-bit-size> <data>*
<container-ingredient> ::= ‘C’ <8-bit-size> <data>*
<compressed-ingredient> ::= ‘Z’ <8-bit-size> <data>*
<map-ingredient> ::= ‘M’ <8-bit-size> <data>*
<bytes-ingredient> ::= ‘B’ <8-bit-size> <data>*

I hope no one ever has to hand craft a file or see the characters, but I like that they make sense in smaller documents. For larger documents, you are probably going to need a tool or program to keep track of structure, so the letters on the composite ingredients are less important..

I think the next post will be focused on the composite ingredients. Specifically the containers.

Friday, February 20, 2015

WatSON Size and Type Specification

In trying to understand where I am going with the WatSON specification, it is useful to have some background on the MP4 file specification. Atomic Parsley provides a mostly easy to digest background on MP4 atoms.

In the first 8 bytes of every atom, you have enough context to either skip over the atom, or dive deeper into the atom. I wanted to create a specification that allowed the same type of flexibility. A specification where the fundamental component of the file format is simple, but easily extensible. I am trying to keep the size of the format down as well, so I wanted to come up with a model that lets me represent types like "true" and "false" in a single byte.

Note, the names used after here are just place holders. I have been more interested in the format concept than naming at this point:

<Ingredient> ::= <Type-Marker> [<Size> <data>*]

Every ingredient starts with a Type-Marker. Type Markers are a single byte with two components. The 6 lowest bits determine the data-type. This would be similar to the atom name in MP4 files. It basically tells the parser what to expect inside the data section.

The highest 2 bits of the Type-Marker represent the size-type. The size type describes how large the Size value will be. MP4 doesn't have a similar concept. Sizes are always 4 bytes long, and special sizes are used to communicate non-standard sizes.

Size
Type
Data
Type
b7b6b5b4b3b2b1b0

The size type is basically a way to help reduce the overhead of smaller ingredients. Smaller types, like numbers, use 8-bit sizes, while larger types like long-strings and big-containers use a 64-size. Here is an example how a string ingredient could use the different size types.

<empty-string> ::= '\x33' ; st bits == 00, dt bits == 110011
<short-string> ::= 's' <8-bit-size> <data>* ; st == 01, dt == 110011
<med-string> ::= '\xB3' <16-bit-size> <data>* ; st == 10, dt == 110011
<long-string> ::= '\xF3' <64-bit-size> <data>* ; st == 11, dt == 110011

The overhead for storing different types is 1 byte, 2 bytes, 3 bytes and 9 bytes. An empty string is represented by a single byte, with no size data following. The other string types have a required size component of various lengths. String data in WatSON will not be null terminated.

For the most common cases, this uses less space than storing a string in bson format, which has a fixed 6 byte overhead (1 for type, 4 for size, 1 for null-terminator). For strings longer than 65k, it has a larger 9 byte overhead, but can also store strings significantly larger than 4 gigabytes.

I haven't decided how I want to flag compression requirements on 64-bit sizes. I was thinking of maybe having the 64-bit size be signed (negative being compressed), or maybe reserving the highest bits for special flags like encryption and compression. Another idea I am toying around with is a "compressed container", such that Ingredients themselves aren't compressed, but they exist in a container that is compressed.

All of this is draft ideas at this point, but I am looking for some feedback.