Artificial Intelligence Will Comply. That is an issue

The risk of having artificially intelligent computers do our bidding is that we may not be careful enough with our wishes. The lines of code that animate these machines will unavoidably lack nuance, fail to lay out cautions, and wind-up giving AI systems goals and incentives that do not correspond to our genuine preferences.

In 2003, the Oxford philosopher Nick Bostrom proposed a now-classic thought experiment showing this difficulty. Bostrom envisioned a super intelligent robot with the seemingly harmless aim of producing paper clips. The robot finally transforms the entire globe into a massive paper clip factory.

Such a possibility might be disregarded as hypothetical, a problem for the far future. However, mismatched AI has become a problem far sooner than planned.

The most concerning case involves billions of individuals. YouTube uses AI-based content recommendation algorithms to enhance watching time. Two years ago, computer experts and consumers noticed that YouTube's algorithm appeared to be achieving its aim by promoting increasingly radical and conspiratorial material. According to one researcher, after seeing footage of Donald Trump campaign rallies, YouTube then gave her videos with "white nationalist rants, Holocaust denials, and other troubling stuff." "Videos on vegetarianism lead to videos about veganism," she said of the algorithm's upping-the-ante tactic. Videos about jogging lead to videos on ultramarathon running." According to study, YouTube's algorithm has been polarizing and radicalizing individuals, as well as spreading disinformation, in order to keep us watching. "If I were planning things out, I probably wouldn't have made it the first test case of how we're going to roll out this technology at such a large scale," said Dylan Hadfield-Menell, an AI researcher at the University of California, Berkeley.

YouTube's engineers most likely had no intention of radicalizing mankind. However, coders cannot possibly consider everything. "The present method we approach AI places a lot of weight on the designers to comprehend the repercussions of the incentives they provide their systems," Hadfield-Menell explained. "And one thing we're discovering is that a lot of engineers have made mistakes."

One important issue is that humans frequently don't know what goals to offer our AI systems since we don't know what we truly desire. "If you ask someone on the street, 'What do you want your autonomous car to do?' they would respond 'Collision avoidance,'" said Dorsa Sadigh, a Stanford University AI expert who specializes in human-robot interaction. "But you understand that's not all; individuals have a variety of tastes." Super safe self-driving cars go too slowly and often, making passengers ill. When programmers attempt to enumerate all of the aims and preferences that a robotic automobile should be able to balance at the same time, the list unavoidably becomes incomplete. Sadigh claimed she frequently gets stopped behind a self-driving car that has halted in the roadway when travelling in San Francisco. It is securely avoiding contact with a moving item, as instructed by its engineers — but the object is something like to a plastic bag blowing in the wind.

Researchers have begun to design a totally new technique of programming helpful robots in order to avoid these issues and perhaps address the AI alignment challenge. Stuart Russell, a renowned computer scientist at Berkeley, is most closely linked with the approach's theories and research. Russell, 57, was a pioneer in the fields of rationality, decision-making, and machine learning in the 1980s and 1990s, and he is the primary author of the widely used textbook Artificial Intelligence: A Modern Approach.

According to Russell, despite its effectiveness at specialized tasks like as beating people at Jeopardy! and Go, recognising objects in photos and words in speech, and even writing music and literature, today's goal-oriented AI is ultimately restricted. Asking a machine to optimize a "reward function" — a meticulous description of some combination of goals — will inevitably result in misaligned AI, according to Russell, because it's impossible to include and correctly weight all goals, sub goals, exceptions, and caveats in the reward function, let alone know which ones are correct. Giving objectives to free-roaming, "autonomous" robots will become more dangerous as they get more clever, because the robots will be relentless in their pursuit of their reward function and will want to prevent humans from turning off.

Instead of pursuing their own objectives, robots should attempt to meet human tastes; their main purpose should be to learn more about what our preferences are. Russell believes that ambiguity about our choices, as well as the requirement for AI systems to rely on us for direction, will keep AI systems safe. Russell puts out his theory in his latest book, Human Compatible, in the form of three "principles of useful machines," mirroring Isaac Asimov's three rules of robotics from 1942, but with less naivety. According to Russell's version:

1-   The machine's sole goal is to optimize the fulfilment of human desires.

2-   The computer is first unsure about these preferences.

3- Human conduct is the ultimate source of information regarding human preferences.

Russell and his Berkeley team, along with similar organizations at Stanford, the University of Texas, and elsewhere, have been inventing novel techniques to inform AI systems about human preferences without ever having to describe those choices.

These labs teach machines how to learn the preferences of people who have never stated them and may not even know what they desire. Robots may learn our wishes by observing imperfect examples, and they can even design new behaviors that aid in the resolution of human uncertainty. (For example, at four-way stop signs, self-driving cars developed the habit of backing up slightly to signal to human drivers to proceed.) These findings imply that AI may be remarkably adept at inferring our attitudes and preferences, even while we acquire them on the fly.

"These are the first attempts to formalize the problem," Sadigh explained. "It's only recently that individuals have realized we need to take a closer look at human-robot interaction."

It needs to be seen whether the early attempts and Russell's three principles of useful machines truly augur a bright future for AI. The method hinges on robots' capacity to comprehend what people genuinely desire — something the humanity has been attempting to find out for some time. At the very least, Paul Christiano, an alignment researcher at OpenAI, said Russell and his colleagues have helped "define out what the ideal behavior is like - what it is that we're looking for."

Understanding a Human

Russell's theory struck him as a revelation, a wonderful feat of brilliance. It was 2014, and he was in Paris on sabbatical from Berkeley, on his way to a choir rehearsal as a tenor. "Because I'm not a very good musician," he explained recently, "I was usually having to study my music on the subway on the way to practice." As he filmed beneath the City of Light, his headphones were filled with Samuel Barber's 1967 choral piece Agnus Dei. "It was such a lovely piece of music," he said. "It simply occurred to me that what counts, and so what the aim of AI is, is the aggregate quality of human experience."

He understood that robots should not attempt to maximize watching time or paper clips; instead, they should simply try to better our lives. "If the responsibility of robots is to attempt to optimize that aggregate quality of human experience, how on earth would they know what that was?"

Russell's thought has a far deeper history. He has been interested in artificial intelligence since his high school days in London in the 1970s, when he developed tic-tac-toe and chess-playing algorithms on a local college's computer. He then began speculating about rational decision-making after relocating to the AI-friendly Bay Area. He quickly realized that it was impossible. Humans aren't rational because it's not computationally viable to be: we can't possibly determine which action at any given instant will result in the optimal outcome trillions of acts later in our long-term future; neither can AI. Russell proposed that our decision-making is hierarchical: we approximate rationality by pursuing vague long-term objectives through medium-term goals while paying the greatest attention to our present surroundings.

Russell's Paris revelation occurred at a critical juncture in the science of artificial intelligence. Months before, an artificial neural network utilizing a well-known method known as reinforcement learning astounded experts by rapidly learning from scratch how to play and beat Atari video games, even inventing new skills along the way. Reinforcement learning is the process by which an AI learns to optimize its reward function, such as its game score; when it tries out different actions, the ones that enhance the reward function are reinforced and are more likely to occur in the future.

Russell created the reversal of this strategy in 1998, which he refined with his associate Andrew Ng. An "inverse reinforcement learning" system, unlike reinforcement learning, attempts to understand what reward function a human is maximizing rather than optimizing an encoded reward function. An inverse reinforcement learning system deciphers the underlying objective when given a series of behaviours, whereas a reinforcement learning system determines the optimum actions to perform to achieve a goal.

Russell got to talking about inverse reinforcement learning with Nick Bostrom, of paper clip fame, a few months after his Agnus Dei-inspired revelation, during a discussion concerning AI governance at the German foreign ministry. "That's where the two things met," Russell explained. On the subway, he realized that machines should seek to improve the overall quality of human experience. He recognized now that if they're unsure how to do it — if computers don't know what people desire — "they might use some type of inverted reinforcement learning to learn more."

A machine learns a reward function that a person is chasing using normal inverse reinforcement learning. In reality, though, we may be eager to actively assist it in learning about us. Russell returned to Berkeley after his sabbatical and began working with his collaborators on a new type of "cooperative inverse reinforcement learning" in which a robot and a human can collaborate to learn the human's true preferences in various "assistance games" — abstract scenarios representing real-world, partial-knowledge situations.

The off-switch game, which they created, tackles one of the most obvious ways autonomous robots might become misaligned with our genuine preferences: by deactivating their own off switches. In 1951, Alan Turing argued in a BBC radio speech (the year after publishing a seminal work on AI) that it would be feasible to "maintain the machines in a submissive posture, for example, by shutting off the power at crucial periods." That is currently considered overly simple by researchers. What's to stop an intelligent agent from turning off its own light or, more broadly, defying directives to cease expanding its reward function? Russell says in Human Compatible that the off-switch dilemma is "the root of the challenge of control for intelligent systems."

Russell and his colleagues demonstrated that, in general, unless Robbie is certain of what Harriet herself would do, it would prefer to let her decide. "It turns out that ambiguity about the goal is vital for guaranteeing that we can switch the machine off, even when it's more clever than us," Russell wrote in Human Compatible.

Imperfect Decisions

Russell's ideas are "finding their way into the AI community's brains," according to Yoshua Bengio, scientific director of Mila, a renowned AI research institution in Montreal. He claims that Russell's approach, in which AI systems aim to reduce their own uncertainty about human preferences, is achievable through deep learning — the powerful method at the heart of the recent revolution in artificial intelligence, in which the system sifts data through layers of an artificial neural network to find patterns. "Of course, additional study is required to make that a reality," he added.

Russell sees two big obstacles. "One is that our conduct is so far from logical that it may be difficult to reconstruct our genuine underlying desires," he explained. AI systems will need to reason about the hierarchy of long-term, medium-term, and short-term goals – the plethora of preferences and obligations we all have. Robots will need to find their way through the murky webs of our subconscious ideas and unarticulated aspirations if they are to assist humans (and avoid making terrible blunders).

The second difficulty is that human tastes shift. Our thoughts evolve over our lifetimes, and they also shift on a dime, based on our mood or changing circumstances that a robot may struggle to detect.

Furthermore, our behaviours may not always correspond to our values. People can hold opposing values at the same time. Which of these should a robot optimize for? To avoid catering to our worst impulses (or, worse, magnifying those impulses, making them easier to gratify, as the YouTube algorithm did), robots may learn our meta-preferences: "preferences about what types of preference-change processes would be acceptable or undesirable." How do we feel about our mood swings? It's a lot for a poor robot to comprehend.

We're attempting to figure out our preferences, both what they are and what we want them to be, and how to deal with ambiguities and inconsistencies, just like the robots. We're also attempting — at least some of us, some of the time — to comprehend the form of the good, as Plato referred to the object of knowledge. AI systems, like people, may be stuck eternally asking questions — or sitting in the off position, unable to help.

"I don't anticipate us to have a fantastic knowledge of what good is anytime soon," Christiano added, "or ideal answers to any of the empirical issues we confront very soon." But, on a good day, I hope the AI systems we design can answer those questions as well as a person and be involved in the same types of iterative processes to enhance those answers as people are."

However, there is a third key issue that was left off Russell's list of concerns: What about terrible people's preferences? What's to stop a robot from assisting its bad owner's malicious goals? AI systems, like affluent people, tend to discover ways around restrictions, therefore merely prohibiting them from committing crimes is unlikely to be effective.

Or, to put it another way, what if we're all bad? YouTube has struggled to improve its recommendation system, which, after all, detects common human tendencies.

Russell remains optimistic. Although additional algorithms and game theory study are needed, he believes damaging inclinations can be successfully down-weighted by programmers — and that the same method may even be beneficial "in the way we raise children, educate people, and so on." In other words, we may discover a way to teach ourselves through educating robots to be decent. "I feel that this is a chance, perhaps, to lead things in the right path," he continued.