Blockchain Based Control and Safety of Artificial Intelligence

Buzzwordy title alert.

Although there were many individuals worried about recursive self-improving AI, the alarms weren’t really sounded until Nick Bostrom wrote Superintelligence. For those readers who are unfamiliar with why superintelligent AIs, AGIs for short, might be scary, they can look at my notes or this post here. Long story short, an AI that is vastly more intelligent than us that isn’t aligned with our interests may decide something that isn’t in our best interest. 

The oft-quoted example of AGI, aka superintelligent AI, gone awry is the paperclip maximizer. While this example doesn’t exactly capture all the nuance, one can get the gist of the problem. An AGI is created whose sole goal is to create as many paperclips as possible, since it’s so good at its job, ends up killing all humans and turning all matter into paperclips. A more “human” example of an AGI gone away is a corporation, aka Enron or any oil company. Cash flows and profit, the internal metric of success or objective functions, they use becomes divorced from their original purpose of creating a good for society. Bitcoin and other cryptocurrency networks also represent some kind of recursively improving organism with no clear point of disconnect and have some individuals worrying about blockchains and AI. AGIs gone away would represent the principal-agent problem on steroids. You could well argue that Bitcoin, or cryptocurrencies are a version of this paperclip maximizer, especially the Proof-of-Work variants. 

The basic assumption that researchers in the field make is that AGI is going to happen someday. If not 15 years away, less than 100. 100 years in the course of the universe is nothing. Therefore, solving this problem of defining an objective function, or guardrails for an AGI is of the utmost importance. Sadly, this isn’t quite incentivized today. However, the work that has been done can be summed up as such:

  • Alignment: Making sure its objective function doesn’t kill us. Work that I’m most familiar with is coherent extrapolated volition and approval directed agents.
  • Capability restraint  For example, an AI that is air-gapped from the internet can give just yes or no answers, aka becoming a genie.

However, Bostrom presents another idea on AI control that I think doesn’t get enough coverage. In a few short words "make the objective function tied to the acquisition of some cryptographic token". While this seems unintuitive at first, it becomes akin to us trying to earn money, or dogs doing tricks for doggie treats. In the original proposal, Bostrom proposes to use a centralized cryptographic token managed by scientists. Superintelligence was published before this current hype cycle as well as theoretical work on new cryptographic primitives had begun. During that time, there’s been a little bit of fervor over how blockchains can positively increase the capability of artificially intelligent systems such as Computable by providing more data sets, not much has been written about the safety side.  (No surprise there). Here are some specific high-level proposals that can be stacked on top of each other to control and align agents.
  1. Use a decentralized cryptocurrency as reward function. This one is straightforward enough. Using centralized cryptographic tokens as the goal suffers from the same reason that centralized cryptocurrencies didn’t take off. They introduce the same single point of failure. If a scientist is somehow held at gunpoint by an AGI, he or she will probably hand over some tokens. It’s much harder to hold a network of miners and anonymous token holders at gunpoint.
  2. Instantiate an AGI as a DAO. This allows this entity to operate trustlessly, which is a double-edged sword. This allows the AGI to sustain itself and operate with or without supervision. But it also keeps an auditable trail of where and when the objective function cryptocurrency was added to the specific address. 
  3. Define reward function as a smart contract to be executed trustlessly. This is where it starts to get a little harder to conceptualize. We can state in plain English what something is. This matters for reinforcement learning agents. Objective functions in terms of Starcraft or Go, are simply to win the game. However, we may want to iterate/check up on the operation of the AI, and update an objective function as we go on, and not let the individual agent be able to change any part of the objective function. Then, use a widely distributed governance token, so pseudonymous actors can allow for changes to this governance token. Keep identities private so that the agent isn’t able to harass/bribe them. Monitor past voting behavior, by adopting a trail of “reputation” for voters to check for any bribery, this can also be determined on-chain.
  4. Use curve bonded tokens to get rid of “take over attacks”.Curve bonded tokens have programmatically defined prices for minting and redeeming (and then burning) a set of tokens. To perform any goal, the agent is probably going to have a lot of cash on hand. What if he tries to buy up a supply of governance token? That would be bad, then it could change its own objective function. To prevent this, we can set the curve for purchasing tokens at an absurdly high price as more tokens are minted. Corresponding, we can set an extremely small sell-out price to disincentivize any sales.
  5. Use TCRs (or some other game theoretically sound) ranked lists to tokenize “human values” and direct an AGI to optimize for that set of “human values".The previous example talked about defining a goal in terms of ETH held. That would be easily calculable if the goal of the agent was to maximize the NAV of its investment portfolio. However, as we know today, defining something just in terms of money can lead to some perverse outcomes. If the means of money become the ends, then that leads to greedy short-term actions that can be taken by agents.
  • Instead, we might want to optimize for human well-being. How do we define this on chain, so this measure can’t be hacked by an autonomous agent? We utilize decentralized stake-based rating games, namely TCRs with a curve bonded token for staking. You can read a little more about TCRs here.
  • Back to representing human well-being “on-chain”. First, we have to define how this is defined in the real world. Various NGOs and ratings orgs track things like the HDI, Human Happiness Index, and GDP per capita. These are top line objectives that countries may try to aim for, through actions that make individual citizens happy. Of course, countries are free to ignore these ratings as well. However, autonomous agents won’t be if their objective functions are locked down.
  • So how does that tie into the blockchain? These indexes have a large self reported component right now, and TCRs are good for encoding intangible and subjective information into hard economic terms. By creating this list that might be composed of “happiness”, “wealth for humanity”, and “sugar, spice, and everything nice”, we might have the agent take off-chain actions that benefit humanity.

The largest points of failures would seem to be the voters, especially if they have their identity revealed. Perhaps we can have less intelligent agents that vote on issues for the most intelligent agent, each with their own objective functions that need to be modified. With any organization or incentive structure, there always needs to be a balance between being able to change something and not letting the wrong actors change things. I think this game is especially fun to play when thinking through an actor that is vastly more intelligent than I. 

Early Adopters of Crypto

Attention is the most scarce thing in the world. On a macro level, the world is awash in capital. Interest rates in countries are below zero. However, within our daily lives, there are always thousands of things competing for our attention. A question I like to think through is, where are the early adopters focusing their limited attention. Chris Dixon says it’s people messing around in garages building something. A revised question along those same lines is: 

Which nation/market is an early adopter of technology? How do their market dynamics predict what might happen in another geography?

First, a little theory. The world is a connected graph of people. Word of mouth is the thing that really gets people to adopt products. Facebook decreased the six degrees of separation down to around 4.5. However, among this distribution of connections between people and connections isn’t even. When we think of information flow, it’s more of a uni-directional graph. This means that person A can influence person B, but not usually not vice versa.

When we think of how information spreads, I think of a tinder over a dry terrain. While something doesn't spark 100% of the time, but when it does, there's the potential for a cascade of "catching fire". Within a network, there are early adopters and late adopters. These people are differentiated by personality traits, sources of information, and levels of connectedness in both the upstream and downstream direction in terms of where they get their information. I usually split the adoption curve into three sets of people:

  • 1) people who do things because it is novel or cool. This is an intrinsic motivator. These are early adopters.
  • 2) people who do things because there’s an economic need. This is an extrinsic motivator. These are middle adopters.
  • 3) people who do things because everyone else is doing something. These are late adopters.

So now that we have that out of the way, this is my current mental model for crypto adoption.

I am increasingly looking towards Asia for technology and more specifically Korea for cryptocurrencies. Due to special features in what their graph looks like, they have interesting winner take all dynamics as well as being early adopters. Information spreads quickly because of the connectedness and centrality of its social graph. The whole nation using Kaokao, has high-speed internet access, a high appetite for novelty and coolness, very tight-knit business communities, and have historically been early adopters of new technologies. Before the States got around to these things in Web 1.0, Korea was already on top of camera phones in the early 2000s, playing MMORPGS and other things, and over the top streaming (aka Netflix).

Bill Gurley and associates caught onto this trend and planned a trip to Korea to see what might be gleaned from this market. What resulted was a sharpening of their thesis around Social, Local, Mobile. When the iPhone hit everyone’s hand in 2008, we had the confluence of the internet, GPS, and camera in every pocket. And the rest is history, that Benchmark fund invest in a plethora of internet hits most notably Uber and Snapchat.

The current environment for Korea is pretty telling. 30% of South Korea owns or holds some sort of crypto, past the tipping point for widespread social adoption. When the regulators tried to shut exchanges down, HODLs raised their voices. I’m excited to see how individuals interact with token powered protocols as usability and scalability allow us to fall down the Marginal Benefit Curve of cryptocurrencies. While we’re still stuck at the store-of-value and the speculative era of cryptocurrencies, that should change soon.

Even now, as staking protocols begin to proliferate, crypto holders are looking to gain an edge in earning incremental token. We should start to see use Vest and Compound.Finance gain adoption as the usability of protocols begins to drop.

I’m personally not as bullish for developing countries as leading indicator as early adopters. As weird as it sounds, they need cryptocurrencies too much. My mental model for early adopters are the ones that like toys, the weirdos, the rich people and more that are willing to accept the flaws in the product. There’s something about intrinsic motivation as opposed to extrinsic motivation that drives the sickness and retention of a product/technology. I would much rather look towards the high-risk tolerance ICO investors than look towards traditional business and crypto “enterprise alliances”.