The risk of having artificially intelligent computers do our bidding is that we may not be careful enough with our wishes. The lines of code that animate these machines will unavoidably lack nuance, fail to lay out cautions, and wind-up giving AI systems goals and incentives that do not correspond to our genuine preferences.
In 2003, the Oxford
philosopher Nick Bostrom proposed a now-classic thought experiment
showing this difficulty. Bostrom envisioned a super intelligent robot with the
seemingly harmless aim of producing paper clips. The robot finally transforms
the entire globe into a massive paper clip factory.
Such a possibility might
be disregarded as hypothetical, a problem for the far future. However,
mismatched AI has become a problem far sooner than planned.
The most concerning case
involves billions of individuals. YouTube uses AI-based content recommendation
algorithms to enhance watching time. Two years ago, computer experts and
consumers noticed that YouTube's algorithm appeared to be achieving its
aim by promoting increasingly radical and conspiratorial material. According to
one researcher, after seeing footage of Donald Trump campaign rallies,
YouTube then gave her videos with "white nationalist rants, Holocaust
denials, and other troubling stuff." "Videos on vegetarianism lead
to videos about veganism," she said of the algorithm's upping-the-ante
tactic. Videos about jogging lead to videos on ultramarathon running."
According to study, YouTube's algorithm has been polarizing and radicalizing
individuals, as well as spreading disinformation, in order to keep us watching.
"If I were planning things out, I probably wouldn't have made it the first
test case of how we're going to roll out this technology at such a large
scale," said Dylan Hadfield-Menell, an AI researcher at the
University of California, Berkeley.
YouTube's engineers most
likely had no intention of radicalizing mankind. However, coders cannot
possibly consider everything. "The present method we approach AI places a
lot of weight on the designers to comprehend the repercussions of the
incentives they provide their systems," Hadfield-Menell explained.
"And one thing we're discovering is that a lot of engineers have made
mistakes."
One important issue is
that humans frequently don't know what goals to offer our AI systems since we
don't know what we truly desire. "If you ask someone on the street, 'What
do you want your autonomous car to do?' they would respond 'Collision
avoidance,'" said Dorsa Sadigh, a Stanford University AI expert who
specializes in human-robot interaction. "But you understand that's not all;
individuals have a variety of tastes." Super safe self-driving cars go too
slowly and often, making passengers ill. When programmers attempt to enumerate
all of the aims and preferences that a robotic automobile should be able to
balance at the same time, the list unavoidably becomes incomplete. Sadigh
claimed she frequently gets stopped behind a self-driving car that has halted
in the roadway when travelling in San Francisco. It is securely avoiding
contact with a moving item, as instructed by its engineers — but the object is
something like to a plastic bag blowing in the wind.
Researchers have begun to
design a totally new technique of programming helpful robots in order to avoid
these issues and perhaps address the AI alignment challenge. Stuart Russell, a
renowned computer scientist at Berkeley, is most closely linked with the
approach's theories and research. Russell, 57, was a pioneer in the fields of
rationality, decision-making, and machine learning in the 1980s and 1990s, and
he is the primary author of the widely used textbook Artificial Intelligence: A
Modern Approach.
According to Russell,
despite its effectiveness at specialized tasks like as beating people at
Jeopardy! and Go, recognising objects in photos and words in speech, and even
writing music and literature, today's goal-oriented AI is ultimately
restricted. Asking a machine to optimize a "reward function" — a
meticulous description of some combination of goals — will inevitably result in
misaligned AI, according to Russell, because it's impossible to include and
correctly weight all goals, sub goals, exceptions, and caveats in the reward
function, let alone know which ones are correct. Giving objectives to
free-roaming, "autonomous" robots will become more dangerous as they
get more clever, because the robots will be relentless in their pursuit of
their reward function and will want to prevent humans from turning off.
Instead of pursuing their
own objectives, robots should attempt to meet human tastes; their main purpose
should be to learn more about what our preferences are. Russell believes that
ambiguity about our choices, as well as the requirement for AI systems to rely
on us for direction, will keep AI systems safe. Russell puts out his theory in
his latest book, Human Compatible, in the form of three "principles of
useful machines," mirroring Isaac Asimov's three rules of robotics from
1942, but with less naivety. According to Russell's version:
1-
The machine's sole goal is to optimize the
fulfilment of human desires.
2-
The computer is first unsure about these
preferences.
3- Human conduct is the ultimate source of
information regarding human preferences.
Russell and his Berkeley
team, along with similar organizations at Stanford, the University of Texas,
and elsewhere, have been inventing novel techniques to inform AI systems about
human preferences without ever having to describe those choices.
These labs teach machines
how to learn the preferences of people who have never stated them and may not
even know what they desire. Robots may learn our wishes by observing imperfect
examples, and they can even design new behaviors that aid in the resolution of
human uncertainty. (For example, at four-way stop signs, self-driving cars
developed the habit of backing up slightly to signal to human drivers to
proceed.) These findings imply that AI may be remarkably adept at inferring our
attitudes and preferences, even while we acquire them on the fly.
"These are the first
attempts to formalize the problem," Sadigh explained. "It's only
recently that individuals have realized we need to take a closer look at
human-robot interaction."
It needs to be seen
whether the early attempts and Russell's three principles of useful machines
truly augur a bright future for AI. The method hinges on robots' capacity to
comprehend what people genuinely desire — something the humanity has been
attempting to find out for some time. At the very least, Paul Christiano, an
alignment researcher at OpenAI, said Russell and his colleagues have helped
"define out what the ideal behavior is like - what it is that we're
looking for."
Understanding a Human
Russell's theory struck
him as a revelation, a wonderful feat of brilliance. It was 2014, and he was in
Paris on sabbatical from Berkeley, on his way to a choir rehearsal as a tenor.
"Because I'm not a very good musician," he explained recently,
"I was usually having to study my music on the subway on the way to
practice." As he filmed beneath the City of Light, his headphones were
filled with Samuel Barber's 1967 choral piece Agnus Dei. "It was such a
lovely piece of music," he said. "It simply occurred to me that what
counts, and so what the aim of AI is, is the aggregate quality of human
experience."
He understood that robots should not attempt to maximize watching time or paper clips; instead, they should simply try to better our lives. "If the responsibility of robots is to attempt to optimize that aggregate quality of human experience, how on earth would they know what that was?"
Russell's thought has a
far deeper history. He has been interested in artificial intelligence since his
high school days in London in the 1970s, when he developed tic-tac-toe and
chess-playing algorithms on a local college's computer. He then began
speculating about rational decision-making after relocating to the AI-friendly
Bay Area. He quickly realized that it was impossible. Humans aren't rational
because it's not computationally viable to be: we can't possibly determine
which action at any given instant will result in the optimal outcome trillions
of acts later in our long-term future; neither can AI. Russell proposed that
our decision-making is hierarchical: we approximate rationality by pursuing
vague long-term objectives through medium-term goals while paying the greatest
attention to our present surroundings.
Russell's Paris
revelation occurred at a critical juncture in the science of artificial
intelligence. Months before, an artificial neural network utilizing a
well-known method known as reinforcement learning astounded experts by rapidly
learning from scratch how to play and beat Atari video games, even inventing
new skills along the way. Reinforcement learning is the process by which an AI
learns to optimize its reward function, such as its game score; when it tries
out different actions, the ones that enhance the reward function are reinforced
and are more likely to occur in the future.
Russell created the
reversal of this strategy in 1998, which he refined with his associate Andrew
Ng. An "inverse reinforcement learning" system, unlike reinforcement
learning, attempts to understand what reward function a human is maximizing
rather than optimizing an encoded reward function. An inverse reinforcement
learning system deciphers the underlying objective when given a series of
behaviours, whereas a reinforcement learning system determines the optimum
actions to perform to achieve a goal.
Russell got to talking
about inverse reinforcement learning with Nick Bostrom, of paper clip fame, a
few months after his Agnus Dei-inspired revelation, during a discussion
concerning AI governance at the German foreign ministry. "That's where the
two things met," Russell explained. On the subway, he realized that
machines should seek to improve the overall quality of human experience. He
recognized now that if they're unsure how to do it — if computers don't know
what people desire — "they might use some type of inverted reinforcement
learning to learn more."
A machine learns a reward
function that a person is chasing using normal inverse reinforcement learning.
In reality, though, we may be eager to actively assist it in learning about us.
Russell returned to Berkeley after his sabbatical and began working with his
collaborators on a new type of "cooperative inverse reinforcement
learning" in which a robot and a human can collaborate to learn the
human's true preferences in various "assistance games" — abstract scenarios
representing real-world, partial-knowledge situations.
The off-switch game,
which they created, tackles one of the most obvious ways autonomous robots
might become misaligned with our genuine preferences: by deactivating their own
off switches. In 1951, Alan Turing argued in a BBC radio speech (the year after
publishing a seminal work on AI) that it would be feasible to "maintain
the machines in a submissive posture, for example, by shutting off the power at
crucial periods." That is currently considered overly simple by
researchers. What's to stop an intelligent agent from turning off its own light
or, more broadly, defying directives to cease expanding its reward function?
Russell says in Human Compatible that the off-switch dilemma is "the root
of the challenge of control for intelligent systems."
Russell and his
colleagues demonstrated that, in general, unless Robbie is certain of what
Harriet herself would do, it would prefer to let her decide. "It turns out
that ambiguity about the goal is vital for guaranteeing that we can switch the
machine off, even when it's more clever than us," Russell wrote in Human
Compatible.
Imperfect Decisions
Russell's ideas are
"finding their way into the AI community's brains," according to
Yoshua Bengio, scientific director of Mila, a renowned AI research institution
in Montreal. He claims that Russell's approach, in which AI systems aim to
reduce their own uncertainty about human preferences, is achievable through
deep learning — the powerful method at the heart of the recent revolution in
artificial intelligence, in which the system sifts data through layers of an
artificial neural network to find patterns. "Of course, additional study
is required to make that a reality," he added.
Russell sees two big
obstacles. "One is that our conduct is so far from logical that it may be
difficult to reconstruct our genuine underlying desires," he explained. AI
systems will need to reason about the hierarchy of long-term, medium-term, and
short-term goals – the plethora of preferences and obligations we all have.
Robots will need to find their way through the murky webs of our subconscious
ideas and unarticulated aspirations if they are to assist humans (and avoid
making terrible blunders).
The second difficulty is
that human tastes shift. Our thoughts evolve over our lifetimes, and they also
shift on a dime, based on our mood or changing circumstances that a robot may
struggle to detect.
Furthermore, our behaviours
may not always correspond to our values. People can hold opposing values at the
same time. Which of these should a robot optimize for? To avoid catering to our
worst impulses (or, worse, magnifying those impulses, making them easier to
gratify, as the YouTube algorithm did), robots may learn our meta-preferences:
"preferences about what types of preference-change processes would be
acceptable or undesirable." How do we feel about our mood swings? It's a
lot for a poor robot to comprehend.
We're attempting to
figure out our preferences, both what they are and what we want them to be, and
how to deal with ambiguities and inconsistencies, just like the robots. We're
also attempting — at least some of us, some of the time — to comprehend the
form of the good, as Plato referred to the object of knowledge. AI systems,
like people, may be stuck eternally asking questions — or sitting in the off
position, unable to help.
"I don't anticipate
us to have a fantastic knowledge of what good is anytime soon," Christiano
added, "or ideal answers to any of the empirical issues we confront very
soon." But, on a good day, I hope the AI systems we design can answer
those questions as well as a person and be involved in the same types of
iterative processes to enhance those answers as people are."
However, there is a third
key issue that was left off Russell's list of concerns: What about
terrible people's preferences? What's to stop a robot from assisting its bad
owner's malicious goals? AI systems, like affluent people, tend to discover
ways around restrictions, therefore merely prohibiting them from committing
crimes is unlikely to be effective.
Or, to put it another
way, what if we're all bad? YouTube has struggled to improve its
recommendation system, which, after all, detects common human tendencies.
Russell remains
optimistic. Although additional algorithms and game theory study are needed, he
believes damaging inclinations can be successfully down-weighted by programmers
— and that the same method may even be beneficial "in the way we raise
children, educate people, and so on." In other words, we may discover
a way to teach ourselves through educating robots to be decent. "I feel
that this is a chance, perhaps, to lead things in the right path," he
continued.
Social Plugin