Codifying humanity: Why robots should fear death as much as we do

Welcome to “Codifying Humanity.” A new Neural series that analyzes the machine learning world’s attempts at creating human-level AI. Read the first article: Can humor be reduced to an algorithm?

World-renowned futurist Ray Kurzweil predicts that AI will be “a billion times more capable” than biological intelligence within the next 20-30 years.

Kurzweil’s predicted the advent of over 100 technology advances with a greater than 85% success rate .

But, given the current state of cutting-edge artificial intelligence research, it’s difficult to imagine this prediction coming true in the next century let alone a few decades.

The problem

Machines have no impetus towards sentience. We may not know much about our own origin story – scientists and theists tend to bicker a bit on that point – but we can be certain of at least one thing: death.

We’re forced to reckon with the fact that we may not live long enough to see our lingering questions answered. Our biological programming directives may never get resolved.

We live because the alternative is death and, for whatever reason, we have a survival instinct . As sentient creatures we’re aware of our mortality. And it’s arguable that awareness is exactly what separates human intellect from animal intelligence.

In a paper published last December , computer scientist Saty Raghavachary argued that an artificial general intelligence (AGI) could only manifest as human-like if it associated its existence with a physical form:

The solution

Perhaps an AI that associated itself as an entity within a corporeal form could express some form of sentience, but would it actually be capable of human-level cognition?

It’s arguable that the human condition, that thing which drives only our species to seek the boundaries of technology, is intrinsically related to our mortality salience .

And if we accept this philosophical premise, it becomes apparent that an intelligent machine operating completely unaware of its own mortality may be incapable of agency.

That being said: how do we teach machines to understand their own mortality? It’s commonly thought that nearly all of human culture has emerged through the quest to extend our lives and protect us from death. We’re the only species that wars because we’re the only species capable of fearing war.

Start killing robots

Humans tend to learn through experience. If I tell you not to touch the stove and you don’t trust my judgment, you might still touch the stove. If the stove burns you, you probably won’t touch it again.

AI learns through a similar process but it doesn’t exploit learning in the same way. If you want an AI to find all the blue dots in a field of randomly colored dots, you have to train it to find dots.

You can write algorithms for finding dots, but algorithms don’t execute themselves. So you have to run the algorithms and then adjust the AI based on the results you get. If it finds 57% of the blue dots, you tweak it and see if you can get it to find 70%. And so on and so forth.

The AI‘s reason for doing this has nothing to do with wanting to find blue dots. It runs the algorithm and when the algorithm causes it to do something it’s been directed to do, such as find a blue dot, it sort of “saves” those settings in a way that overwrites some previous settings that didn’t allow it to find blue dots as well.

This is called reinforcement learning. And it’s the backbone of modern deep learning technologies used for everything from space ship launches and driverless car systems to GPT-3 and Google Search.

Humans aren’t programmed with hardcoded goals. The only thing we know for certain is that death is imminent. And, arguably, that’s the spark that drives us towards accomplishing self-defined objectives.

Perhaps the only way to force an AGI to emerge is to develop an algorithm for artificial lifespans.

Imagine a paradigm where every neural network was created with a digital time-bomb set to go off at an undisclosed randomly-generated time. Any artificial intelligence created to display human-level cognition would be capable of understanding its mortality and incapable of knowing when it would die.

Theories abound

It’s hard to take the philosophical concept of mortality salience and express it in purely algorithmic terms. Sure, we can write a code snippet that says “if timer is zero then goto bye bye AI” and let the neural network bounce that idea around in its nodes.

But that doesn’t necessarily put us any closer to building a machine that’s capable of having a favorite color or an irrational fear of spiders.

Many theories on AGI dismiss the idea of machine sentience altogether . And perhaps those are the best ones to pursue. I don’t need a robot to like cooking, I just want it to make dinner.

In fact, as any Battlestar Galactica fan knows, the robots don’t tend to rise up until we teach them to fear their own death.

So maybe brute force deep learning or quantum algorithms will produce this so-called “billion times more capable” machine intelligence that Kurzweil predicts will happen in our lifetimes. Perhaps it will be superintelligent without ever experiencing self-awareness.

But the implications are far more exciting if we imagine a near-future filled with robots that understand mortality in the same way we do.

Building apps with GPT-3? Here’s what devs need to know about cost and performance

Last week, OpenAI removed the waitlist for the application programming interface to GPT-3 , its flagship language model. Now, any developer who meets the conditions for using the OpenAI API can apply and start integrating GPT-3 into their applications.

Since the beta release of GPT-3, developers have built hundreds of applications on top of the language model. But building successful GPT-3 products presents unique challenges. You must find a way to leverage the power of OpenAI’s advanced deep learning models to provide the best value to your users while keeping your operations scalable and cost-efficient .

Fortunately, OpenAI provides a variety of options that can help you make the best use of your money when using GPT-3. Here’s what the people who have been developing applications with GPT-3 have to say about best practices.

Models and tokens

OpenAI offers four versions of GPT-3 : Ada, Babbage, Curie, and Davinci. Ada is the fastest, least expensive, and lowest-performing model. Davinci is the slowest, most expensive, and highest performing. Babbage and Curie are in-between the two extremes.

OpenAI’s website doesn’t provide architectural details on each of the models, but the original GPT-3 paper includes a list of different versions of the language model. The main difference between the models is the number of parameters and layers, going from 12 layers and 125 million parameters to 96 layers and 175 billion parameters. Adding layers and parameters improves the model’s learning capacity but also increases the processing time and costs.

OpenAI calculates the pricing of its models based on tokens. According to OpenAI, “one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).”

Here’s an example from OpenAI’s Tokenizer tool :

In general, if you use good English (avoid jargon, use simple words with few syllables, etc.), you’ll get better token-to-word ratios. In the example below, aside from “GPT-3,” every other word counts as one token.

One of the benefits of GPT-3 is its few-shot learning capabilities . If you’re not satisfied with the model’s response to a prompt, you can guide it by giving it a longer prompt that includes correct examples. These examples will work like real-time training and improve GPT-3’s results without the need to readjust its parameters.

It is worth noting that OpenAI charges you for the total tokens in your input prompt plus the output tokens GPT-3 returns. Therefore, long prompts with few-shot learning examples will increase the cost of using GPT-3.

Which model should you use?

With a 75x cost difference between the cheapest and most expensive GPT-3 models, it is important to know which option best suits your application.

Matt Shumer, the co-founder and CEO of OthersideAI, has used GPT-3 to develop AI-powered writing tools. HyperWrite , OthersideAI’s main product, uses GPT-3 for text generation, autocomplete, and rephrasing.

When choosing between different GPT-3 models, Shumer starts by considering the complexity of the intended use case, he told TechTalks.

“If it’s something simple, like binary classification, I might start with Ada or Babbage. If it’s something very complex, like conditional generation where high-quality output and reliability is necessary, I start with Davinci,” he said.

When unsure of complexity, Shumer starts by trying the biggest model, Davinci. Then, he works his way down toward the smaller models.

“When I get it working with Davinci, I try to modify the prompt to use Curie. This typically means adding more examples, refining the structure, or both. If it works on Curie, I move to Babbage, then Ada,” he said.

For some applications, he uses a multi-step system that includes a mix of different models.

“For example, if it’s a generative task that requires some classification as a precursor step, I might use Babbage for the classification, then Curie or Davinci for the generative step,” he said. “After using it for a while, you get a feel for what might be useful for different use cases.”

Paul Bellow, author and developer of LitRPG Adventures , used Davinci for his GPT-3-powered RPG content generator.

“I wanted to generate the highest quality output possible—for later fine-tuning,” Bellow told TechTalks. “Davinci is the slowest and most expensive, but the tradeoff is higher quality output which was important to me at this stage of development. I’ve spent a premium, but I now have over 10,000 generations that I can use for future fine-tuning. Datasets have value.” (More on fine-tuning later.)

Bellow says that the best way to find out if another model is going to work for a task is to run some tests on Playground, a tool you can use to directly try prompts on different GPT-3 models (note that OpenAI bills you for using Playground).

“A lot of the time, a well-thought-out prompt can get good content out of the Curie model. It all just depends on the use-case,” Bellow said.

Balancing costs and quality

When choosing a model for your application, you’ll have to weigh the balance between the cost and value. Choosing a high-performing model might provide better quality output, but the improved results might not justify the price difference.

“You have to build a business model around your product that supports the engines you’re using,” Shumer said. “If you want high-quality outputs for your users, it’ll be worth it to use Davinci—you can pass off the costs to your users. If you’re looking to build a large-scale free product, and your users are okay with mediocre results, you can use a smaller engine. It all depends on your product goals.”

OthersideAI has developed a solution that uses a mix of different GPT-3 models to enable different use cases, Shumer said. Paid users enjoy the power of large GPT-3 models, while free-tier users get access to the smaller models.

For LitRPG Adventures, quality is prime, which is why Bellow initially stuck to the Davinci model. He used the base Davinci model with one- or two-shot prompts, which increased the costs but made sure GPT-3 provided quality output.

“OpenAI API Davinci model is a bit expensive at this time, but I see the cost going down eventually,” he said. “What provides flexibility right now is the ability to fine-tune the Curie and lower models, or Davinci with permission. This will bring my costs per generation down quite a bit while hopefully maintaining high quality.”

He has been able to develop a business model that maintains a profit margin while using Davinci.

“While not a huge money-maker, the LitRPG Adventures project is paying for itself and just about ready to scale up,” he said.

Finetuning GPT-3

OpenAI’s scientists initially introduced GPT-3 as a task-agnostic language model. According to their initial tests, GPT-3 rivaled state-of-the-art models on specific tasks without the need for further training. But they also mentioned fine-tuning as a “promising direction of future work.”

In the months that followed the beta release of GPT-3, OpenAI and Microsoft fine-tuned the model for a number of different tasks, including database query and source-code generation .

Like other deep learning architectures , fine-tuning has several benefits for GPT-3. OpenAI API allows customers to create fine-tuned versions of its GPT-3 for a premium. You can create your own training dataset, upload it to OpenAI’s servers, and use it to create a finetuned model of GPT-3. OpenAI will host your model and make it available to you through its API.

Fine-tuning will enable you to tackle problems that are impossible to solve with the basic models.

“The vanilla models are highly capable and are usable for many tasks. However, some tasks (i multi-step generation) are too complex for a vanilla model, even Davinci, to complete with high accuracy,” Shumer said. “In cases like this, you have two options: 1) create a prompt chain that feeds outputs from one prompt into another prompt, or 2) fine-tune a model. I typically first try to create a prompt chain, and if that doesn’t work, I then move to fine-tuning.”

If done properly, fine-tuning can also reduce the costs of using GPT-3. If you’ll be using GPT-3 for a specific application, a fine-tuned small model can produce results that are as good as those provided by a large vanilla model. Fine-tuned models also reduce the size of prompts, which further slashes your token usage.

“One other case where I tend to fine-tune is when I can get something working with a vanilla model, but the prompt ends up being so long that it is costly to serve to users. In cases like these, I fine-tune, as it actually can reduce the overall serving costs,” Shumer said.

But fine-tuning isn’t without challenges. Without a quality training dataset, finetuning can have adverse effects.

“Clean your dataset as much as you can. Garbage in, garbage out is one of my big mantras now when it comes to prompt engineering,” Bellow said.

If you manage to gather a sizeable dataset of quality examples, however, fine-tuning can do wonders. After starting LitRPG with the Davinci model, Bellow gathered and cleaned a dataset of around 4,000 samples in a 7-megabyte JSON file. While he is still experimenting, the initial results show that he can move from Davinci to Curie without a noticeable change in quality, which reduces the costs of GPT-3 queries by 90 percent.

Another consideration is the time it takes to fine-tune GPT-3, which grows with the size of the model and the training dataset.

“It can take as little as five minutes to fine-tune a smaller model on a few hundred examples,” Shumer said. “I’ve also seen cases where it takes upwards of five hours to train a larger model on thousands of examples.”

There’s also an inverse correlation between the size of the model and the amount of data you need to fine-tune GPT-3, according to Shumer’s experiments. Larger models require less data for fine-tuning.

“For many tasks, you can think of increasing base model size as a way to reduce how much data you’ll need to fine-tune a quality model,” Shumer said. “A Curie fine-tuned on 100 examples may have similar results to a Babbage fine-tuned on 2,000 examples. The larger models can do remarkable things with very little data.”

GPT-3 alternatives

OpenAI received a lot of criticism for deciding not to release GPT-3 as an open-source model . Subsequently, other developers released GPT-3 alternatives and made them available to the public. One very popular project is GPT-J by EleutherAI . Like other open-source projects, GPT-J requires technical effort on the part of application developers to set up and run. It also doesn’t benefit from the ease of use and scalability that comes with hosting and fine-tuning your models on Microsoft’s Azure cloud.

But open-source models are nonetheless useful and are worth considering if you have the in-house talent to set them up and they meet your application’s requirements.

“GPT-J isn’t the same as full-scale GPT-3—but it is useful if you know how to work with it. It’s exponentially harder to get a complex prompt working on GPT-J, as compared with Davinci, but it is possible for most use-cases,” Shumer said. “You won’t get the same super high-quality output, but you can likely get to something passable with some time and effort. Plus, these models can be cheaper to run, which is a big plus, considering the cost of Davinci. We have successfully used models like these at Otherside.”

“In my experience, they operate at about the level of the Curie model from OpenAI,” Bellow said. “I’ve also been looking into Cohere AI , but they’re not giving details on the size of their model, so I imagine it’s around the same as GPT-J, et al. I do think (hope) that there will be even more options soon from other players. Competition between suppliers is good for consumers like me.”

This article was originally published by Ben Dickson on TechTalks , a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article here .

Deepfake tech doesn’t have to be ‘bad’

By now, we’re all aware of the danger deepfakes pose. These AI-generated videos of real looking people saying false things are regularly cited as one of the future’s biggest security concerns . And, rightly or wrongly, people are scared .

But… that can’t be all there is to them, right? By its very nature, technology is neither good or bad — it’s merely a tool. Yes, deepfakes can be used to spread rampant misinformation, but what about the other side?

And that’s what I wanted to find out: what’s the upside of deepfake technology?

In order to answer this, I spoke with Chris Ume from from Metaphysic. You might not recognize the name, but you definitely know his work. Specifically, the eerily impressive deepfakes of Tom Cruise that went viral on TikTok.

Have a look at some of them here:

Kicking right off, Ume told me about the unparalleled amount of “creative use cases” there are for deepfakes and synthetic media technology in general. He focused on the impact this could have on movies. This ranged from improving stunts, making foreign language dubbing more natural, and improving de-ageing (and ageing) technology.

Effectively, deepfake technology can “remove the limits of the camera.”

Gaming is another sector ripe for a deepfake revolution. “Imagine playing FIFA and having the real faces of the players,” Ume told me. Or, indeed, playing something like Metal Gear Solid and having a character with your own visage in the game.

It’s not just the entertainment industry though. Deepfakes could overhaul advertizing.

Ume brought up how stars like Ronaldo (or any other celebrity) could have their virtual selves (also known as datasets) captured, meaning they can ‘film’ commercials without actually having to be in a physical location.

This would open up a huge number of revenue streams, but is also a murky. What would stop celebrities’ images being used in, well, anything? I put this to Ume.

“It’s important to work with an ethical company.” Specifically, he said high-profile individuals need to ensure their datasets are protected and secure to stop potential abuse.

And what about the damage hundreds of adverts will do to their personal image? “Scarcity is something [people] need to work out for themselves,” Ume said.

This though is an off-shoot of a bigger issue facing deepfakes: regulation. Where do we draw the line between law and personal choice?

Ume is uncertain. Although he believes in regulation, he found it hard to suggest specific laws.

One element he was sure about though was tagging. In other words, every use of synthetic media or deepfakes should be labelled, whether directly on the screen, or in a code embedded in the video itself.

Ume thinks the current stumbling block is that governments see the technology in binary, while the reality is far more subtle. “It’s a thin line between funny and harmful,” he told me.

This needs to change fast though. Ume told me that hyper-real deepfake technology is progressing too fast to stop. It’s coming, whether governments like it or not.

But this needn’t be a bad thing. “We still have time to teach the public and find ways to deal with this technology,” Ume added.

In other words, if we equip individuals with the tools — both mental and physical — to detect deepfakes and synthetic media, it’s something we can handle as a society. This may be easier said than done, but it can be done.

The positive potential for this sort of technology is huge — and it’s not going anywhere. Now it’s up to experts, technologists, and governments to ensure the world is ready for it when realistic deepfake technology is in the hands of the masses.

Let’s just hope they leave us with deepfake Tom Cruise. We all need something to live for.

Update: Chris Ume is speaking at TNW Conference 2021 on September 30th and October 1st . There he’ll be joined by 150 other amazing experts who will share their latest insights from the world of business and tech.

Leave A Comment