Machine learning systems have been wiping the floor with their human opponents for over a decade (seriously, Watson’s first Jeopardy win was in 2011), though the types of games they excel at are pretty limited. Usually competitive board games or video games that use a limited playing field, sequential moves, and at least one clearly defined opponent, any game that requires number crunching is to their advantage. Diplomacy, however, requires very little calculation, instead requiring players to negotiate directly with their opponents and make respective plays simultaneously, something modern ML systems are generally not designed to do. But that hasn’t stopped Meta researchers from designing an AI agent that can negotiate global policy positions as well as any UN ambassador.
Diplomacy was first released in 1959 and works like a more refined version of RISK in which between two and seven players take on the roles of a European power and try to win the game by conquering their opponents’ territories. Unlike RISK, where the outcome of conflicts is decided by simply rolling the dice, Diplomacy requires players to first negotiate with each other (forming alliances, backstabbing, all that good stuff) before they all move their pieces simultaneously during the next phase of the game. The skills to read and manipulate opponents, to convince players to form alliances and plan complex strategies, to navigate tricky partnerships, and to know when to switch sides, are a big part of the game, and all skills that players generally lack. machine learning systems.
On Wednesday, Meta AI researchers announced that they had overcome those machine learning shortcomings with CICERO, the first AI to show human-level performance in Diplomacy. The team trained Cicero on 2.7 billion parameters over the course of 50,000 rounds on webDiplomacy.net, an online version of the game, where he finished second (out of 19 entrants) in a 5-game league tournament, all while he doubled the average score of his opponents.
The AI agent proved so adept “at using natural language to negotiate with people in Diplomacy that they often preferred working with CICERO over other human participants,” Meta’s team noted in a press release on Wednesday. “Diplomacy is a game of people rather than pieces. If an agent can’t recognize that someone is probably lying or that another player would consider a certain move aggressive, they will quickly lose the game. Likewise, if they don’t speak like a person real, showing empathy, building relationships, and speaking knowledgeably about the game, you won’t find other players willing to work with it.”
Essentially, Cicero combines the strategic mindset of Pluribot or AlphaGO with the natural language processing (NLP) abilities of Blenderbot or GPT-3. The agent is even capable of foresight. “Cicero can deduce, for example, that later in the game he will need the support of a particular player, and then strategize to win that person’s favor, and even recognize the risks and opportunities that player sees from his point of view. private view”. sight,” the research team noted.
The agent does not train through a standard reinforcement learning scheme as similar systems do. The Meta team explains that doing so would lead to suboptimal performance as “relying solely on supervised learning to choose actions based on previous dialogs results in an agent that is relatively weak and highly exploitable.”
Instead, Cicero uses an “iterative scheduling algorithm that balances dialogue consistency with rationality.” It will first predict your opponents’ moves based on what happened during the bargaining round, as well as the move it thinks your opponents think you’ll make before “iteratively improving these predictions by trying to pick new policies that have a higher expected value.” given the other player’s predicted policies, while trying to keep the new predictions close to the original policy’s predictions.” Easy, right?
The system is still not foolproof, as sometimes the agent will get too smart and end up playing themselves by taking contradictory trading positions. Still, his performance in these early trials is superior to that of many human politicians. Meta plans to continue developing the system to “serve as a safe sandbox for advancing research in human-AI interaction.”
All Engadget Recommended products are curated by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you purchase something through one of these links, we may earn an affiliate commission. All prices are correct at the time of publication.