Biology in the Coming Years

If I had to compare the development of the synthetic biology/biotech stack to that of the computer, I would say we’re still pretty early. In biology, we’re in the big mainframe era, before the development of the transistor and integrated circuit.


Here's my thinking:


Biology Today Mainframe Era
Long Dev. Cycle Times/Sharing resources Waiting for western blots and gels to run… Waiting for cultures to grow. Few hours to a few days. Trying to get mainframe time to run programs. Few hours to a few days.
Low Debugging No idea if an organism works until actually produced (no in silico modeling) Punch Cards!!! and No compiler
Low reusability/reliability of parts Genes often don’t work outside of their original organism Vaccuum tubes get moths stuck in them
Fragmented community Limited hackers, mostly stuck within universities limited hackers, mostly stuck within universities
Low Abstraction Individual Gene Sequences Punch Cards/Machine Code
Low Complexity of Programs

Today: Yeast that makes beer and a scent

Future: Designer cows??
Then: Computing missile trajectories

Today: Google

And moreover, right now, Ph.D. students and Undergrads are oftentimes just manual labor.

  • Compare:  to 
These student while credentialed as ever don’t touch the interesting problems like experimental design, have much of a say in what projects they work on. I can personally attest to this. For the few short months that I worked in a cancer lab, I was bored to tears. I spent the first week excited from learning to perform different protocols. The next few months were spent being bored to tears. Day in and day out, all I did was move a small amount of liquid from point A to point B. The automation of labor will bring huge headwinds.

It’s not all bad news. Just as the mainframe era evolved into the computer revolution, the bench-work era in biology will give way to a cloud-based, automated version of biology. This is great news for the general public and a great business opportunity. Here are the startups that are bringing a CS approach to biology.


  • The “App” Layer” -> Machine learning applied to discovery: These companies are using large data sets and deep learning techniques to make biological products to sell.
    • Existing drugs: Mine drug databases to find new combinations that will work for treatment on different diseases. This is a huge growth area and makes a lot of sense for a deep learning company firm to enter the market. Since drugs combinations don't have to go through Stage 3 Clinical Trials again, and only have to prove that the drug combination is safe, this can give a capital efficient method to producing cures.
    • Molecular: Companies that are making small molecules to treat disease. Atomwise is the most successful company in this space. This also seems like a type of data that deep learning techniques are able to represent more easily than the complex biological circuits. http://arxiv.org/pdf/1510.02855.pdf
    • Genomics/Biologics: These companies are using ML/DL techniques to create useful DNA Sequences and Antibodies. 
    • Organisms: These companies create functional microbes that do different things. End users buy products that these microbes produce--fragrances for perfumes, oil, and therapeutics. Although these companies might use machine learning, this process is more about trial and error and iterative design, compared to the more automated process of small-drug discovery.
  • The “Backend” -> The "Biological Data Analysis Software: Companies here either sell analysis software or offer specific recommendations based on their proprietary algorithms to clinicians, end consumers, or researchers. I’m not sure who will win in this space, as I don’t think it’s clear that having large datasets are very defensible. I think this mostly because the cost of data acquisition is decaying exponentially. I think this may be a reverse situation to consumer internet companies. Where data is easy to get, but the algorithms are the important things. See Craig Venter’s attempt at monetizing the very first full human genome sequencing that failed. Is the timing right, now? 
    • -Omics: Besides our genes, there is RNA, small molecules (like lipids), proteins that make up our cells, and their own “-omics” which respectively are transcriptomics, metabolomics, proteomics (and don’t forget the microbiome. HLI and iCarbonX are the two largest companies trying to make sense all this stuff.
    • Genomics: Genetic analysis software that goes to researchers and clinicians that help drive better decisions.
    • Consumer: Recommendations are given to end consumers. It’s interesting to see that a large consumer player is transitioning from making money on selling tests/data to developing drugs. Will other players follow?
    • Imaging and Misc: More biological data such as image data, ultrasound, or public health. There’s a lot of interesting things that can happen here. Using MRI data to help doctors diagnose PTSD and other neurological conditions is one big thing that comes to mind.
  • Protocol Layer -> Distribution of existing datasets: These companies provide what data there is, how to share data, and how to compute on data.
    • -Omics: Public organizations provide data sets. Companies like Google Cloud Platform allow you to store large data sets and analyze them to a certain extent.
    • Genetic Variation: Companies here are able mapping out the variation within genes.
    • Circuits: These companies build off the popular iGEMs competition and the synthetic bio movement to provide a reusable set of genes to build with. These are usually free to the public, however, organism discovery companies usually have proprietary gene and circuits that they use.
  • The Internet -> Collaboration Software for People: These are more traditional software products—content platforms, data sharing, and design tools.
    • Literature and the Research Network: There are many attempts at making journal articles easy to find and researchers more accessible.
    • Protocols: These are attempts to make biology more reproducible through the creation of standardized languages to describe experiments in discrete, repeatable steps.
    • Gene Design Tools: The IDEs for biology. Software here is trying to make genes and organisms easy to build with WYSIWYG and visual interfaces. A lot of these products are put out by DNA synthesis companies that want to make the designs scientists produce… for a profit.
  • Creating a Functioning Lab: Funding and bench work are broken. Moving towards a fully automated lab.
    • Funding/Equity Models: Everyone knows that basic research funding is broken. Both the number and average size of grants is decreasing. There are many crowdfunding competitors here. There’s an interesting attempt at creating “equity” with the blockchain.
    • Machine Automation in the Lab: Companies here are looking at the hardware in the lab. Different approaches include an Uber for Lab Experiments, an AWS for experiments, and creating remote access for your own lab.
    • Automating Assays: Taking care of the mixing and matching of assays/reactions within a lab.
    • Lab Management Software: Traditional software that is trying to get a lab functioning better.


My initial thoughts on investment themes:
  • The AWS for lab automation as well as computation will be huge. Automation frees up more than man hours, the lower cost of science will allow scientists to conduct ever more research. Biology has historically been a pretty good adopter of computer techniques to model/simulate/discover organisms. However, historically all three things necessary for machine learning—data, computational capacity, and the algorithms haven’t been able to handle modeling of biological systems. All three areas are now changing. In the past, 1 petaflop would have cost infinite money, now this only costs $400 dollars on AWS. By 2020, we’ll be producing more genomic data than is uploaded to Youtube. All this data will need to be stored safely and computed on. Deep learning in discovery is only going to become more interesting as those algorithms continue to develop.
  • Continuing machine learning’s march into basic research/medicine. There are lots of attempts at making sure research is read, and that people can collaborate, but is that the right approach? Even now, there's not enough time for a biologist to stay on top of current literature. Although early, there are attempts extracting structured data from literature and pushing them through Watson to synthesize finding. After synthesis, researchers or clinicians can use data to create new experiments/make more informed decisions. This will only quicken as adoption of a high-level language used to describe experiments that are machine readable spreads.
  • How to share data is an open problem: There haven’t been many businesses that are trying to build large scale open sharing of genetic info/data sets. Although both HLI and iCarbonX endeavor to aggregate huge data sets to (in the long term) create medicines that extend human lifespan, their short term plan is to sell sequenced consumer data to drug companies thru B2B licensing agreements. This places the valuable data outside the hands of smaller researchers and gives patient data to large companies. I’d be interested in seeing how bitcoin (and especially 21) play into the development of open sharing in biology. With projects like https://github.com/joepickrell/genome-server-21 and https://github.com/joepickrell/phenopredict21 happening, bitcoin shows it's flexibility. Although this was a proof of concept, I think analysis of data, has the potential to put personal health data sharing in the hands of the people rather than doctors and companies.
  • Developing direct relationships between patients and drug companies. Many companies are taking a very new model for finding patients. These companies are directly developing relationships with patients/users of their drugs. Instead of partnering with hospitals and large health care networks to find study candidates, they can do so with a lower cost of capital with the internet. 23andMe is a shining example.
  • Bio is becoming a lot cheaper.  Look at the Perlstein Lab. They're able to do drug and mouse studies on software startup run rates.

Work being done by these companies to bring biology up to software speed is incredible. But what does it really mean for end consumers? What kind of products will we see? Here are my predictions for what we'll see by the end of 2020:

DisneyWorld and Tech Habits

I had the good fortune of spending a lot of time with my extended family this past holiday season. A group of twelve with ages ranging from four to eighty-plus were shuttled down to DisneyWorld. 

"Are we having fun yet?"


It was endearing to see my youngest cousin's eyes light up as we explored the amusement park in between her bouts of crying. However, my next youngest cousin, age thirteen, did not share this same sense of wonderment. Instead, he was obsessed with maximizing the number of likes on his Instagram photos. The eldest among us, the young Baby Boomers were also stuck on their phones browsing WeChat. Although the samples sizes are small, each generation had a different relationship to their phones, but used their phone no less than any other group.

Generation Z kids were virtually born with their smart phones in their hands. They think Facebook is too confusing, but as they enter HS, they'll be forced to use it. Sorry kids. Facebook is the New Linkedin (which is the New Email). After getting off of Aladdin's Magic Carpet ride, we went to go cool off by getting Dole Whip, pineapple flavored ice cream. As soon we got the Dole Whip into our hands, my twelve year old cousin was taking pictures to post to Instagram. He continued to edit, filter, and post Instagram photos ASAP. I quizzed him on his strategies to garner more likes on Instagram and he talked about how specific times during the day were better and worse, how he had multiple accounts to drive traffic (read: SPAM), how he'd use Instagram Direct to organize group chats, and would add hashtags on hashtags on each photo. While older folks might share that they ate Dole Whip in casual conversation around the water cooler, he wanted to share in real-time. Just goes to show that internet really is everywhere.

Millennials grew up on a desktop computer. We might be able to put their phones down while waiting in line, but probably not. We talk mostly with friends through groupchats and Snapchat. On my own phone, I kept up with college friends in several different GroupMe's. Sometimes simultaneously sending chats back and forth with the same friends in different GroupMe Groups. To a certain extent, we're caught in the middle conscious of when we use our phones, but still trying to share things on Snapchat in the moment. We browse Facebook as a last resort and mostly while at home. We're the only ones who think it's a good idea to carry around a DSLR, the other groups stick to using their phones. We're still trying to outgrow our hipster phase.

(Yung) Baby Boomers. These folks came to internet and mobile phone late in their lives, and as a result of that unfortunate occurrence, their thumbs aren't as fully developed as the younger generations. Because of that, Baby Boomers are forced to poke at their screens with their pointer fingers. Although this trait makes me laugh, it is actually an advantage while browsing their app of choice, WeChat. WeChat employs a heavy text interface, with several layers of menus and lists that need to be carefully navigated to post the pictures and chat in group chats. A fat thumb is just not up to this task. These Baby Boomers also favor voice conversations when trying to make the smallest of smallest of small talk. They treat their text messages as an email inbox, by allowing unread messages to pile up. While I'd be compelled to tap at each blue dot, my mom has no problem letting hundreds of messages go unopened.

While these groups may have the same apps downloaded, their habits across apps greatly varies. The metaphors they bring from their previous experiences with tech inform how they'll use their phones. For me, the best moments of our trip were times when we put our phones down phones and share cringeworthy family jokes.

---

Thanks to Josh Lee for reading a draft of this.

The Startup Game

A few weeks ago, the Guesstimate beta came out. It's pretty cool; it’s like Excel with Crystal Ball built right in. You can input a single number or a range of values and build models with it. Guesstimate’s release and the holiday season gave me the perfect chance to explore an idea on the startup industry. I had been meaning to building a model to understand the formation and development of a startup to its eventual failure or exit.

This is one in of a long line of attempts to try and quantify an often-times opaque industry. Two prominent examples of data-driven approaches to venture financing are Aileen Lee’s TechCrunch article that popularized the term ‘unicorn’ and a recent Cambridge Associates research report on venture returns becoming less concentrated. While both of these reports are good attempts to understand an aspect of the startup formation and funding, it’s often hard to understand how a startup in moved along through this process if you are new to the industry.

 
The startup industry model in Guesstimate takes inspiration from Sam Gerstenzang's Open Source Venture Model and Bryan Johnson’s OSF Playbook. Like any model, my startup model is an attempt to make explicit assumptions and beliefs about the world to be tested. It allows you to change values to see how each element can push and pull on each other. It follows one cohort of companies started in a year and follows them through their life cycle. It assumes a set amount of capital available at each stage that is always spent on financing that set of companies. You can play with the model here. Right now, you can make changes the model on Guesstimate, but no changes are saved once you leave the page. Varying the “exit multiple” and the number of deals participated in by VCs have the most dramatic effect on the model.

Some key things learned and reinforced in the course of building the model:
  • There’s a huge amount of disagreement in just how many startups are started every year. The Kauffman Foundation says that ~6,000,000 new businesses are created, while not stating how many are high growth startups. Marc Andreessen says there are 4,000 startups that are created. In addition, people still don’t agree on the definition of a startup.
  • It’s really hard to build startup. So, so many fail. The vast majority of new businesses fail to attract any angel or VC funds at all.
  • Power law distributions are still not internalized by people (and not well represented by this model). The magnitude and difference of returns that one company can generate is just astounding. WhatsApp raised a total of $60 million while exiting at a total valuation of $19 billion, a 316x return on invested capital. 50% of startups will fail to return anything, and the next 40% of startups above that will hopefully return the the total invested capital of investors. It is the WhatsApps of the world, the top 1% that bring home meaningful returns.
  • Angel Investors make up a huge not-as-often-recognized pool of capital to startups. $20 billion is invested per year by Angels into startups. Their importance is hard to overstate at the earliest stages where they enter 50 to 70k deals per year. This prominence has grown since the 2000s due to the low cost of doing a startup provided by AWS and other related services. Since the costs of starting a software startup have dropped so low, VCs aren’t able to deploy such little capital in one deal. Their model does not work like that. Angel Investors, do in fact generate a nice return, in line with VC returns. 
As previously said, Guesstimate has only two distributions normal and uniform distribution and isn’t able to capture much of the statistical reality of startups. While normal distributions may be a good way to model the likelihood of a startup moving onto the next stage of funding, it’s not a very good way to measure the return generated at exit. Right now, the model is merely descriptive (and barely so). In the future, I’d like to move towards a prescriptive model to answer the question: “how can we change the current system to create more impactful innovation in the world?”. Questions such as "Do we need a more diverse group of VCs to allocate capital to different startups?" or  "Is the most effective way to create innovation to pump more money towards VCs or to lower the cost of starting startups?" may be more easily answered with this model.

With that said, here are a few directions I’d like to explore:
  • Exploring how broader macroeconomic trends influence the startup industry. At the midpoint of 2015, China was on pace to invest $30 billion through venture capital. How will 2016 China influence funding this year, and how will these impact the startup ecosystem 5 - 10 years down the line? (Thanks Daniel)
  • How the industry (and cost of doing a startup) affects the rate of formation. While we’ve seen a veritable boom in the formation of software startups, the same can’t be said for life science startups, where the number of initial financings by VCs has remained unchanged. As the cost of doing startups comes down, we should see a pattern of more hardware and biology startups being funded at the early stages. PCH International and Transcriptic are working to do their part to lower costs in their respective industries.
  • Making this model more of a simulation to see how the ecosystem evolves over time. I would like to see how exits by the large companies are able to seed the next generation of angel investors and provide landing grounds for acquisitions. Silicon Valley wasn't built overnight. The dynamic process of companies exiting and investors passing on advice to the next generation is an important to creating huge companies and innovative ecosystems.
  • Add more data! I’d like to see how individual firms, investors, and entrepreneurs are able to influence the growth of a startup instead of aggregated statistics provided by reports.
Thanks for reading! Drop a note on Twitter if you found this interesting!


Thanks to Daniel Kao, Jonathan Zong, and Reed Rosenbluth for reading a draft.

Observations on Company Culture

I recently visited around fifteen companies in SF — small startups just past series A to 20-year old internet companies — without dropping any names, here are some observations.

Authentic belief in a company’s mission — that one’s work is actually important — is different than the normal lip service that companies pay when talking about “changing the world”. Culture isn’t just letting dogs in your office, or nice couches, or wearing Hawaiian shirts. You can literally smell the culture. It’s in the air, written on people’s faces, in how they speak and act. It’s imbued from the top-down through founding stories and values. As well as also from the bottom-up from the interactions between co-workers and visitors . Everyone’s attitude influenced the overall culture, positively or negatively. We all know that communication is 85% percent body language — culture is communicated non-verbally as well.

It’s seems very, very hard to keep missionary cultures as companies grow. Finding engineers is hard enough, but finding engineers is harder still when they need to believe in the mission. Finding engineers is tripley hard when a company is also quadrupling in size. Everywhere we went had smart people, that was clear. However, challenging them to do great work and getting them to believe is hard. The “craziness” of the mission (not a scientific measure) seemed directly correlated with the quality of people.

A focus on metrics and product direction lent a sense of urgency to everyday activity. We visited a company where in the center of the office, the hockey stick was prominently featured. It’s a visual reminder of where the company is, where the company has been, and how the company is doing. Without a view of the metrics, they could kid themselves into believing that they were doing well. There was a huge difference between the companies talked a big game of growth and those that could actually show outsiders their growth.

With all that said, here are a couple of my suggested ingredients for what makes a great culture: founder myths — the trials and tribulations of what the founders had to do to create change in the world (i.e. hero’s journey), missionary people — people who believe they are doing something for others, heaps of trust, a focus towards continual improvement, and luck.

Getting the culture right seems really, really hard, but seems vital to getting real work done.