The Dilemma of Black Box AI in Cybersecurity

Accurate AI doesn’t translate to safe AI

Abhiram Pulavarthi
7 min readFeb 20, 2021
Photo by Markus Winkler on Unsplash

Today, AI models are used in fraud detection, courtrooms, and hospitals throughout the country and are becoming ever more commonplace.

We encounter AI daily, from the search engines we type in to the smart home assistants we interact with. But as we give AI more responsibility, we need to consider its risks.

AI is a term that is thrown around quite often, but what exactly does it mean?

AI is an algorithm capable of learning and making decisions on its own given a constant stream of data. Machine Learning (ML), on the other hand, is a subset of AI. ML finds patterns in data that AI uses to make decisions. It’s also important to note that AI is iterative, meaning it’s constantly trying to improve its accuracy as it encounters new sets of data.

Some prominent examples of AI include Alexa and Siri. They continue to improve their natural language processing abilities to provide more accurate answers to the questions we ask.

Nevertheless, AI has taken the world by storm. But since AI and ML’s advent, serious concerns have arisen over how to protect AI from potential cyberattacks.

This new cybersecurity dilemma is due to the “Black Box” phenomenon. AI is given a set of inputs and outputs and asked to make decisions based on this data. But, how the machine learning algorithm finds patterns in this data is not entirely known to the developer. Its decision making-process is hidden from view; hence the phrase black-box.

The notion of a black box has existed from the beginnings of the tech industry and exists in non-technical settings as well. Not all employees of Google know how Google’s search engine works, nor does a cashier at Shoprite know when the next shipment of bread will come in. However, humans can communicate between sectors in order to turn a “black box” into a “glass box”.

But AI cannot communicate the same way humans can.

And this is where AI goes wrong. Oftentimes, an AI model will pass developer testing with an acceptable range of accuracy. But the developer may not be aware of some hidden associations within the test-data that the AI model relies on to make its predictions.

When the model is deployed, and the data it encounters in real-time does not have the same patterns, all of its prior training goes out the window.

From different background colors to inconsistent formatting, factors we may consider inconsequential can radically alter an AI’s outputs.

And depending on the use-cases for these models, the inaccuracies can be fatal. Joshua Brown used Tesla’s autopilot feature on local roads, even though it was designed for use on highways. The car saw a truck coming from the side, and instead of breaking, it kept going, and the passenger died.

The autopilot feature was certainly trained to recognize trucks, but only the back-side. When it saw the right side of the truck, it confused it for a road sign and decided it was safe to proceed. And this wasn’t the first time. Similar instances took place in 2019 and 2020.

Tesla uses Convolutional Neural Networks (CNNs), a branch of deep learning, to process the images a car sees. In CNN’s, the images a car sees become the input for the model. The algorithm will then break down the image into individual pixels and run it through a filter to identify characteristics: edges, shapes, etc. The model tweaks the weightages of different filters to be able to identify the image in its entirety.

To us, some defining characteristics of a car may be its wheels or body. But since we do not know the weightages the model uses, we also do not know which features are more influential in determining the model’s output.

And it’s deep-learning that companies are using for their AI solutions given its capabilities compared to standard models. It’s also deep learning that is the least interpretable. This opacity has created a new branch of cybersecurity threats, all aimed at exploiting the shortcomings of AI.

Data poisoning and evasion

Given that AI is a black box, hackers do not need access to the machine learning model itself to cause damage: they just need to be able to control the data. Introducing minute changes to test data, inconceivable to people, can alter the relationships a machine learning program finds.

Photo by Francois Chollet on The Keras Blog

An image-recognition model could be trained to recognize the picture on the left as a panda. However, by adding interference generated from an image of a gibbon, the model no longer sees a panda: it sees a gibbon. And to the developer, there is no difference between the image on the left to the image on the right. Interference such as this creates what is known as “adversarial examples”: data that is normal to us but harmful when used in AI.

And it’s not just adversarial interference; a rotated image or a picture taken in different lighting can confuse AI models. Hackers can intentionally create these differences to produce errors for the algorithm.

When used in the context of national security, the implications of this cybersecurity threat are immense.

But it’s not only test data that the hacker can manipulate. Since AI is continuously learning, any systematic change to new data provided could change the “thought process” of a model.

Model stealing and inversion

In other instances, hackers can even map out an ML model in its entirety. A facial recognition model most likely employs deep-learning as a foundational algorithm. A hacker can fine-tune the parameters of the model, as do the developers, to create a similar replica. Once a replica is created, they can practice their hacking techniques on this model.

Another cybersecurity issue is model inversion, where hackers can recreate the training data used to train a model. Hackers can do this in multiple ways.

ML models create confidence ratings that tell the user how confident it is in its results. Take data point A and data point B: both are extremely similar, yet data point A produces a significantly higher confidence rating than point B. The only way for such similar data to produce different confidence ratings is if one of them was included in the original set of training data. From this confidence rating, hackers can discern training data, which in some instances may be confidential, and breach consumer privacy.

Hackers can also recreate training data by reverse-engineering a model. Take a facial-recognition model that spits out a name given a face. A hacker can recreate the image associated with a name through trial and error. They can feed in images, each with a slight change, to produce an image that matches the desired name.

Policies like the GDPR, which allows users to delete their data from companies, don’t work in situations where AI is used. Given the nature of these attacks, even “deleted” data can be recreated by hackers.

Future Steps

Today, models aren’t designed with “interpretability constraints’’: the notion that a developer will need to be able to explain all aspects of the model and easily identify sources of error. And to understand sources of susceptibilities, we may need to understand what each layer in a CNN does or the weightage of certain filters.

If we could peer inside the black box, we can see how a model comes to its conclusion. We can eliminate ways that adversarial data can manipulate a model.

Black box AI is also a result of a mindset that has been created as a result of decades of work in the industry. Work in the tech sector is explosive; in some instances, it may be easier to deploy a product or service and fix any bugs that arise after the fact than conduct rigorous testing prior. After all, facebook’s motto in its early years was “move fast and break things.”

Perhaps it’s the same mindset that has partially led to the creation of black-box AI.

And there is also a notion that black-box AI is the only way to produce accurate AI, which is simply not true. If anything, interpretable AI makes AI more accurate: if developers can understand a model, they can fix errors in an AI’s decision-making process.

“There is no tradeoff between interpretability and accuracy.” — Cynthia Rudin

And maybe, we don’t even need black-box AI. We wouldn’t need to add interpretability constraints if the system was entirely transparent in the first place. The last thing we need is more algorithms making judgment calls involving millions when the same algorithm can’t explain to us why it did what it did.