If I had to compare the development of the synthetic biology/biotech stack to that of the computer, I would say we’re still pretty early. In biology, we’re in the big mainframe era, before the development of the transistor and integrated circuit.
|Biology Today||Mainframe Era|
|Long Dev. Cycle Times/Sharing resources||Waiting for western blots and gels to run… Waiting for cultures to grow. Few hours to a few days.||Trying to get mainframe time to run programs. Few hours to a few days.|
|Low Debugging||No idea if an organism works until actually produced (no in silico modeling)||Punch Cards!!! and No compiler|
|Low reusability/reliability of parts||Genes often don’t work outside of their original organism||Vaccuum tubes get moths stuck in them|
|Fragmented community||Limited hackers, mostly stuck within universities||limited hackers, mostly stuck within universities|
|Low Abstraction||Individual Gene Sequences||Punch Cards/Machine Code|
Low Complexity of Programs
||Today: Yeast that makes beer and a scent
Future: Designer cows??
|Then: Computing missile trajectories
The “App” Layer” -> Machine learning applied to discovery: These companies are using large data sets and deep learning techniques to make biological products to sell.
- Existing drugs: Mine drug databases to find new combinations that will work for treatment on different diseases. This is a huge growth area and makes a lot of sense for a deep learning company firm to enter the market. Since drugs combinations don't have to go through Stage 3 Clinical Trials again, and only have to prove that the drug combination is safe, this can give a capital efficient method to producing cures.
- Molecular: Companies that are making small molecules to treat disease. Atomwise is the most successful company in this space. This also seems like a type of data that deep learning techniques are able to represent more easily than the complex biological circuits. http://arxiv.org/pdf/1510.02855.pdf
- Genomics/Biologics: These companies are using ML/DL techniques to create useful DNA Sequences and Antibodies.
- Organisms: These companies create functional microbes that do different things. End users buy products that these microbes produce--fragrances for perfumes, oil, and therapeutics. Although these companies might use machine learning, this process is more about trial and error and iterative design, compared to the more automated process of small-drug discovery.
The “Backend” -> The "Biological Data Analysis Software: Companies here either sell analysis software or offer specific recommendations based on their proprietary algorithms to clinicians, end consumers, or researchers. I’m not sure who will win in this space, as I don’t think it’s clear that having large datasets are very defensible. I think this mostly because the cost of data acquisition is decaying exponentially. I think this may be a reverse situation to consumer internet companies. Where data is easy to get, but the algorithms are the important things. See Craig Venter’s attempt at monetizing the very first full human genome sequencing that failed. Is the timing right, now?
- -Omics: Besides our genes, there is RNA, small molecules (like lipids), proteins that make up our cells, and their own “-omics” which respectively are transcriptomics, metabolomics, proteomics (and don’t forget the microbiome. HLI and iCarbonX are the two largest companies trying to make sense all this stuff.
- Genomics: Genetic analysis software that goes to researchers and clinicians that help drive better decisions.
- Consumer: Recommendations are given to end consumers. It’s interesting to see that a large consumer player is transitioning from making money on selling tests/data to developing drugs. Will other players follow?
- Imaging and Misc: More biological data such as image data, ultrasound, or public health. There’s a lot of interesting things that can happen here. Using MRI data to help doctors diagnose PTSD and other neurological conditions is one big thing that comes to mind.
Protocol Layer -> Distribution of existing datasets: These companies provide what data there is, how to share data, and how to compute on data.
- -Omics: Public organizations provide data sets. Companies like Google Cloud Platform allow you to store large data sets and analyze them to a certain extent.
- Genetic Variation: Companies here are able mapping out the variation within genes.
- Circuits: These companies build off the popular iGEMs competition and the synthetic bio movement to provide a reusable set of genes to build with. These are usually free to the public, however, organism discovery companies usually have proprietary gene and circuits that they use.
The Internet -> Collaboration Software for People: These are more traditional software products—content platforms, data sharing, and design tools.
- Literature and the Research Network: There are many attempts at making journal articles easy to find and researchers more accessible.
- Protocols: These are attempts to make biology more reproducible through the creation of standardized languages to describe experiments in discrete, repeatable steps.
- Gene Design Tools: The IDEs for biology. Software here is trying to make genes and organisms easy to build with WYSIWYG and visual interfaces. A lot of these products are put out by DNA synthesis companies that want to make the designs scientists produce… for a profit.
Creating a Functioning Lab: Funding and bench work are broken. Moving towards a fully automated lab.
- Funding/Equity Models: Everyone knows that basic research funding is broken. Both the number and average size of grants is decreasing. There are many crowdfunding competitors here. There’s an interesting attempt at creating “equity” with the blockchain.
- Machine Automation in the Lab: Companies here are looking at the hardware in the lab. Different approaches include an Uber for Lab Experiments, an AWS for experiments, and creating remote access for your own lab.
- Automating Assays: Taking care of the mixing and matching of assays/reactions within a lab.
- Lab Management Software: Traditional software that is trying to get a lab functioning better.
- The AWS for lab automation as well as computation will be huge. Automation frees up more than man hours, the lower cost of science will allow scientists to conduct ever more research. Biology has historically been a pretty good adopter of computer techniques to model/simulate/discover organisms. However, historically all three things necessary for machine learning—data, computational capacity, and the algorithms haven’t been able to handle modeling of biological systems. All three areas are now changing. In the past, 1 petaflop would have cost infinite money, now this only costs $400 dollars on AWS. By 2020, we’ll be producing more genomic data than is uploaded to Youtube. All this data will need to be stored safely and computed on. Deep learning in discovery is only going to become more interesting as those algorithms continue to develop.
- Continuing machine learning’s march into basic research/medicine. There are lots of attempts at making sure research is read, and that people can collaborate, but is that the right approach? Even now, there's not enough time for a biologist to stay on top of current literature. Although early, there are attempts extracting structured data from literature and pushing them through Watson to synthesize finding. After synthesis, researchers or clinicians can use data to create new experiments/make more informed decisions. This will only quicken as adoption of a high-level language used to describe experiments that are machine readable spreads.
- How to share data is an open problem: There haven’t been many businesses that are trying to build large scale open sharing of genetic info/data sets. Although both HLI and iCarbonX endeavor to aggregate huge data sets to (in the long term) create medicines that extend human lifespan, their short term plan is to sell sequenced consumer data to drug companies thru B2B licensing agreements. This places the valuable data outside the hands of smaller researchers and gives patient data to large companies. I’d be interested in seeing how bitcoin (and especially 21) play into the development of open sharing in biology. With projects like https://github.com/joepickrell/genome-server-21 and https://github.com/joepickrell/phenopredict21 happening, bitcoin shows it's flexibility. Although this was a proof of concept, I think analysis of data, has the potential to put personal health data sharing in the hands of the people rather than doctors and companies.
- Developing direct relationships between patients and drug companies. Many companies are taking a very new model for finding patients. These companies are directly developing relationships with patients/users of their drugs. Instead of partnering with hospitals and large health care networks to find study candidates, they can do so with a lower cost of capital with the internet. 23andMe is a shining example.
- Bio is becoming a lot cheaper. Look at the Perlstein Lab. They're able to do drug and mouse studies on software startup run rates.
- Vaccines for new disease/viruses being produced within a month or two of apparent threat. Using deep learning and ubiquitous sequencing, you won't have to worry about Zika virus.
- Low cost, lab-grown meat. Already we've produced burger patties in the lab. However, the cost of producing a burger has been too high. Ongoing development in this area is focused on creating the robotics to synthesize and grow these cells at scale. Industrial farming and cows specifically product 15% of all greenhouse gas. Much of this comes from cow farts. With lab-grown meat, we don't have to worry about greenhouse emissions, free up farmland, and prevent animal cruelty.
Bioprinting of tissues and organs. Although it's mostly been contained to university labs and a few startups, we're already seeing 3D bioprinting of biological tissues. Currently, companies like Organovo are working on creating organs-on-a-chip on their way to printing whole organs. In addition, with companies like Biobots are developing cutting edge 3D bioprinters, it looks like this ecosystem is developing nicely.
- A quantitative approach to mental health. PTSD is diagnosed through behavioral tests, however, the rate of misdiagnosis costs the US $18 billion per year. We've recently discovered patterns of brain activity that correspond to this disorder. By analyzing fMRI data, people will be screened for PTSD and other mental health diseases and receive the proper treatment, saving lives and money.
- Continuous monitoring of vital signs and blood analytes. Everyone will be monitored for signs of heart attack, stress, and other things. We'll live happier, healthier lives.