Artificial Intelligence: The Challenge to Keep It Safe

Safety Principle: AI systems should be safe and secure throughout their operational lifetime and verifiably so where applicable and feasible.

When a new car is introduced to the world, it must pass various safety tests to satisfy not just government regulations, but also public expectations. In fact, safety has become a top selling point among car buyers.

And it’s not just cars. Whatever the latest generation of any technology happens to be — from appliances to airplanes — manufacturers know that customers expect their products to be safe from start to finish.

Artificial intelligence is no different. So, on the face of it, the Safety Principle seems like a “no brainer,” as Harvard psychologist Joshua Greene described it. It’s obviously not in anyone’s best interest for an AI product to injure its owner or anyone else. But, as Greene and other researchers highlight below, this principle is much more complex than it appears at first glance.

“This is important, obviously,” said University of Connecticut philosopher Susan Schneider, but she expressed uncertainty about our ability to verify that we can trust a system as it gets increasingly intelligent. She pointed out that at a certain level of intelligence, the AI will be able to rewrite its own code, and with superintelligent systems “we may not even be able to understand the program to begin with.”

What Is AI Safety?

This principle gets to the heart of the AI safety research initiative: how can we ensure safety for a technology that is designed to learn how to modify its own behavior?

Artificial intelligence is designed so that it can learn from interactions with its surroundings and alter its behavior accordingly, which could provide incredible benefits to humanity. Because AI can address so many problems more effectively than people, it has huge potential to improve health and wellbeing for everyone. But it’s not hard to imagine how this technology could go awry. And we don’t need to achieve superintelligence for this to become a problem.

Microsoft’s chatbot, Tay, is a recent example of how an AI can learn negative behavior from its environment, producing results quite the opposite from what its creators had in mind. Meanwhile, the Tesla car accident, in which the vehicle mistook a white truck for a clear sky, offers an example of an AI misunderstanding its surrounding and taking deadly action as a result.

Researchers can try to learn from AI gone astray, but current designs often lack transparency, and much of today’s artificial intelligence is essentially a black box. AI developers can’t always figure out how or why AIs take various actions, and this will likely only grow more challenging as AI becomes more complex.

However, Ian Goodfellow, a research scientist at Google Brain, is hopeful, pointing to efforts already underway to address these concerns.

“Applying traditional security techniques to AI gives us a concrete path to achieving AI safety,” Goodfellow explains. “If we can design a method that prevents even a malicious attacker from causing an AI to take an undesirable action, then it is even less likely that the AI would choose an undesirable action independently.”

AI safety may be a challenge, but there’s no reason to believe it’s insurmountable. So what do other AI experts say about how we can interpret and implement the Safety Principle?

What Does ‘Verifiably’ Mean?

‘Verifiably’ was the word that caught the eye of many researchers as a crucial part of this Principle.

John Havens, an Executive Director with IEEE, first considered the Safety Principle in its entirety, saying,  “I don’t know who wouldn’t say AI systems should be safe and secure. … ‘Throughout their operational lifetime’ is actually the more important part of the sentence, because that’s about sustainability and longevity.”

But then, he added, “My favorite part of the sentence is ‘and verifiably so.’ That is critical. Because that means, even if you and I don’t agree on what ‘safe and secure’ means, but we do agree on verifiability, then you can go, ‘well, here’s my certification, here’s my checklist.’ And I can go, ‘Great, thanks.’ I can look at it, and say, ‘oh, I see you got things 1-10, but what about 11-15?’ Verifiably is a critical part of that sentence.”

AI researcher Susan Craw noted that the Principle “is linked to transparency.” She explained, “Maybe ‘verifiably so’ would be possible with systems if they were a bit more transparent about how they were doing things.”

Greene also noted the complexity and challenge presented by the Principle when he suggested:

“It depends what you mean by ‘verifiably.’ Does ‘verifiably’ mean mathematically, logically proven? That might be impossible. Does ‘verifiably’ mean you’ve taken some measures to show that a good outcome is most likely? If you’re talking about a small risk of a catastrophic outcome, maybe that’s not good enough.”

Safety and Value Alignment

Any consideration of AI safety must also include value alignment: how can we design artificial intelligence that can align with the global diversity of human values, especially taking into account that, often, what we ask for is not necessarily what we want?

“Safety is not just a technical problem,” Patrick Lin, a philosopher at California Polytechnic told me. “If you just make AI that can align perfectly with whatever values you set it to, well the problem is, people can have a range of values, and some of them are bad. Just merely matching AI, aligning it to whatever value you specify I think is not good enough. It’s a good start, it’s a good big picture goal to make AI safe, and the technical element is a big part of it; but again, I think safety also means policy and norm-setting.”

And the value-alignment problem becomes even more of a safety issue as the artificial intelligence gets closer to meeting — and exceeding — human intelligence.

“Consider the example of the Japanese androids that are being developed for elder care,” said Schneider. “They’re not smart; right now, the emphasis is on physical appearance and motor skills. But imagine when one of these androids is actually engaged in elder care … It has to multitask and exhibit cognitive flexibility. … That raises the demand for household assistants that are AGIs. And once you get to the level of artificial general intelligence, it’s harder to control the machines. We can’t even make sure fellow humans have the right goals; why should we think AGI will have values that align with ours, let alone that a superintelligence would.”

Defining Safety

But perhaps it’s time to reconsider the definition of safety, as Lin alluded to above. Havens also requested “words that further explain ‘safe and secure,’” suggesting that we need to expand the definition beyond “physically safe” to “provide increased well being.”

Anca Dragan, an associate professor at UC Berkeley, was particularly interested in the definition of “safe.”

“We all agree that we want our systems to be safe,” said Dragan. “More interesting is what do we mean by ‘safe’, and what are acceptable ways of verifying safety.

“Traditional methods for formal verification that prove (under certain assumptions) that a system will satisfy desired constraints seem difficult to scale to more complex and even learned behavior. Moreover, as AI advances, it becomes less clear what these constraints should be, and it becomes easier to forget important constraints. … we need to rethink what we mean by safe, perhaps building in safety from the get-go as opposed to designing a capable system and adding safety after.”

What Do You Think?

What does it mean for a system to be safe? Does it mean the owner doesn’t get hurt? Are “injuries” limited to physical ailments, or does safety also encompass financial or emotional damage? And what if an AI is being used for self-defense or by the military? Can an AI harm an attacker? How can we ensure that a robot or software program or any other AI system remains verifiably safe throughout its lifetime, even as it continues to learn and develop on its own? How much risk are we willing to accept in order to gain the potential benefits that increasingly intelligent AI — and ultimately superintelligence — could bestow?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

6 replies
  1. Maurizio
    Maurizio says:

    As the concept of safety gets analyzed under the new microscope of an human’s fast evolutionary pace , I would like to circumvent the meaning inquisition and transfer the focus on the safety as spatial issue so to speak or better say geographically areas predetermined to be safe .
    As logically conceivable , the developments of AI and the globalization of such “service” will take place gradually ,
    First will be certain domain of our life , and in more populate area or predominantly habitat able to implement such devices. Economically and politically
    Then will be the wild territory and countries that have maybe been reluctant at first to adopt such social enhancement or too poor of too depressed to have immediate use of an AI
    But this is just the process of spreading or swarming.
    Coming to my point , safety will be a concept of belonging and territory.
    Designated areas to be specific .
    Per example a robonurse will have as safety task ; the patient/s well being physical & emotional and the environment around them . That can be define as a room or an entire house or else .
    So the robonurse won’t be responsible or accountable for damage to the peoples or belongings outside its geographical giurisdiction .
    Along the same line , per example roboworkers will have safety features concerning the specific area where they will be operational.
    Other important example cardroid ; because their field of operations is different from an inclosed space, they will have a software with different features but with the same goal : What is the object we wanna keep safe ? The answer is the same. The humans and his property or belongings . So the goal will be : To avoid any contact or collision in any whatsoever circumstance with a human being and his/her accessory, the concept of accessory can include a pair of sunglasses a dog a purse or a vehicle.

    In order not to be just theoretical about the matter of safety , is considered cool to be propositive .
    Here a brief tip : Infrared sensors are more and more commercially developed and could be mass produced affordably . They can be calibrate to detect not only a human body temperature but also any other life form emitting heat waves .
    Software shape recognition is another useful tool .
    In summary ; Safety can be achieved in large scale and domain if we define first the specific then the general .
    I hope this can be helpful & inspire more practical way to achieve the elusive goal of safety .
    To get more of my topics . Go on twitter
    Best regards .


    Some points that might be germane:
    1. I suggest a revision to the definition of intelligence stated in life 3.0 Intelligence=the ability to SET and accomplish complex goals
    2.Important to the issue of AI safety is the question of how AI will arise. What are the necessary and sufficient conditions and how do they come together. As a simplified illustration, consider the fire triangle. For there to be a fire there must be fuel, oxygen, and enough energy to overcome the activation energy barrier to combustion (most frequently this is a spark, but spontaneous combustion can occur, for example in a bin of wet coal). It’s also important to note that the conditions are not independent of each other. For example, an oxygen enriched atmosphere will lower the activation energy barrier, as will the choice of fuel.
    One obvious extension into the realm of artificial intelligence is the ubiquity of various computer programs providing sufficient ‘fuel’, especially as they are increasingly linked (for example, many complex manufacturing concerns link purchasing, process control, QC/QA and various other aspects of the business). This integration, spurred by the prospect of improved quality and reduced cost, will be a driver towards autonomous goal setting.

  3. Wilma van Arendonk
    Wilma van Arendonk says:

    Met veel interesse lees ik over de ontwikkelingen op het gebied van A.I, Big Data, VR, MR, MR. Ik ben geen wetenschapper of techneut en kan in die zin geen technische aanbevelingen doen, maar ik denk er wel over na en heb er ook ideeën over. Stel dat A.I. niet meer onder controle lijkt te zijn en bots in staat zijn om zichzelf te beschermen tegen uitzetten? Dat is verontrustend.

    Ik ben blij dat veiligheid als gespreksonderwerp wordt behandeld. Wat me wel een beetje ongerust maakt is het feit dat er zo weinig mensen mee bezig zijn. Future of Life is een supergoed initiatief maar de betrokkenheid van mensen blijft achter. Dat geldt niet alleen voor dit artikel maar ik merk dat het niet zo leeft. Wat betekent dit? Sluiten we onze ogen en zien we alleen de goede kanten van A.I (of half mens, half computer)?

    Betrokkenheid begint bij het betrekken van mensen bij dit onderwerp. De zorgen die worden uitgesproken zijn realistisch en dienen onder ogen te worden gezien. Mensen dienen bewust te worden gemaakt van de goede kanten maar ook van de bedreigingen die A.I. met zich meeneemt.

    Zomaar wat hersenspinsels om AI niet te laten escaleren/veiligheid voor de mens te borgen:

    – roep een ethisch wereldinstituut in het leven die de veiligheid van de mens bewaakt tegen ongeoorloofde experimenten, mogelijke aanvallen van A.I.
    – dit wereldinstituut bewaakt geheime code en kan elke zelflerende robot uit schakelen of vernietigen (al weet ik niet of dat extern, technisch mogelijk is);
    – laat bedrijven en wetenschappelijke instituten zich (verplicht) aansluiten bij dit wereldinstituut en laat hen verklaren dat ze goede intenties hebben en alleen op een ‘mensvriendelijk’ zullen programmeren.
    – bescherm mensen die mogelijke discutabele A.I.-praktijken aanhangig maken, neem signaleren serieus (bijvoorbeeld mensen beschermen tegen ontslag die hierover durven te spreken).
    – zorg dat onethische zaken mbt AI onder het strafrecht komt te vallen.
    – informeer burgers over wat A.I. is, wat het goede is aan A.I. maar ook wat bedreigend kan zijn (bewustwording).

  4. Stefan Hermann
    Stefan Hermann says:

    Safety is a relative term. I prefere to say: A product is safe, if the fault rate is lower then a certain value and with certain preconditions.

Comments are closed.