Can Centuries-Old Math Open the Modern Black Box of AI?
A recent study aims to understand how AI is able to perform complex physics.
Posted March 19, 2023 | Reviewed by Abigail Fagan
“In physics—or wherever natural processes seem unpredictable—apparent randomness may be noise or may arise from deeply complex dynamics.” ―James Gleick
Complexity is the fountainhead of the lack of transparency. And opacity is the Achilles’ heel of artificial intelligence (AI) deep learning, the technology underlying ChatGPT and many other algorithms. No one can fully explain how artificial neural networks arrive at their conclusions due to inherent complexity. A new Rice University study uses Fourier analysis and a novel approach to understand how AI deep neural networks learn to perform functions that involve complex physics—a step towards demystifying AI’s black box problem.
“There are ever-growing efforts focused on using machine learning (ML), particularly the powerfully expressive deep neural networks (NNs), to improve simulations or predictions of nonlinear, multi-scale, high-dimensional systems,” wrote Rice University mechanical engineering researchers Adam Subel, Pedram Hassanzadeh, Yifei Guan, and Ashesh Chattopadhyay who conducted the study.
The Brain and Other Non-Linear Complex Systems
Cognition and the human brain are non-linear complex systems—people may spontaneously change their mind and behavior. Similar to how neuroscientists and psychologists have yet to fully explain how the human brain and cognition works, computer scientists and AI researchers do not know exactly how AI deep learning reaches its decisions. This becomes especially problematic for researchers seeking to use AI for complex dynamical systems that evolve over time.
In addition to the human brain, other examples of complex systems include communication systems, ecosystems, organisms, living cells, power grid, transportation systems, integrated manufacturing, weather, and the Earth’s global climate. For this study, the researchers focused on weather and climate.
Fourier Analysis: 200-Year-Old Computational Mathematics Tool
The scientists combined spectral analysis using Fourier analysis of nonlinear dynamical complex systems with spectral analyses of convolutional neural networks (CNNs) in order to show what the AI deep neural network learns and the physical connections between the systems. Spectral analysis decomposes time-series data into sine wave components.
Fourier analysis is a mathematical method to remove noise from data in order to find verifiable patterns and trends. The Fourier analysis is a way to break apart complex time series data into less complex trigonometric functions. It is named after the nineteenth-century French mathematician and physicist Jean Baptiste Joseph Fourier (1768-1830) who is known for his work on harmonics, heat flow and conduction, and is a pioneer in the mathematical study of the Earth’s temperature.
Today, Fourier analysis is used for a variety of purposes such as signal processing, acoustics, sonar, optics, forecasting, image processing, algorithmic trading by computers, x-ray crystallography, spectroscopy, and many more scientific uses, especially in physics.
Convolutional Neural Networks
Convolutional neural networks are a feedforward AI neural network that is commonly used to process complex data such as images, voice, and audio signals input data. CNNs consist of three types of layers: the convolutional layer, which is the first layer, the pooling layer, and the fully connected (FC) layer which is the final layer.
Most of the computation happens in the convolutional layer and consists of input data, a filter, and a feature map. The filter, or kernel, is a feature detector which is a two-dimensional array of weights, often a three-by-three matrix, that represents part of the input data.
A convolution is the process of filtering to check if a feature is present in the data. After each convolution, the convolutional neural network applies to the feature map a Rectified Linear Unit (ReLU) transformation, which is an activation function that introduces non-linearity to the deep learning model and solves the vanishing gradient issue. Vanishing gradients may occur with training machine learning algorithms with gradient descent, an optimization algorithm for finding the local minimum of a differentiable function.
Pooling layers reduces complexity by reducing dimensionality and the number of parameters in the input. The pooling operation sends to the output array data processes by either max pooling with the maximum value, or average pooling with the average value.
In the final layer, the fully connected layer, classifies data, usually via a softmax activation function, based on the features extracted in prior layers and filters. Each node in the output layer connects directly to a node in the prior layer in the fully-connected layer, hence the layer’s name.
In this study, the researchers used convolutional neural networks with 11 sequential convolutional layers, nine of which are hidden layers, trained using the Adam optimizer and a mean-squared-error (MSE) loss function.
A Unique Approach to AI’s Black Box Problem
The scientists took a unique approach in their study that flies directly in the face of conventional AI research practices. Contemporary approaches towards explainable AI transfer learning typically either opt to retrain all or most layers in an ad-hoc manner or retrain the deepest layer near the output of a deep neural network according to the researchers.
Conventional wisdom is that the best layers to retrain for transfer learning of out-of-distribution sets of images are the deepest layers nearest the output layer. The assumption is that the deeper layers learn specific features in the training data versus general features learned in the shallow layers of an artificial neural network. But is this the case for transfer learning for turbulence, weather, and climate modeling?
The researchers challenge the status quo and point out the need for effective transfer learning by scientifically determining what is the best layer in an artificial deep neural network to retrain, especially when it comes to working with weather and climate modeling algorithms. To their knowledge, this is the first investigation of its kind.
“Transfer learning (TL) provides a powerful and flexible framework for improving the out-of-distribution generalization of NNs, and has shown success in various ML applications,” the researchers wrote.
The Rice University researchers developed a general framework that finds the best retraining procedure for a given problem via applied neural network theory and physics.
“This framework will benefit a broad range of applications in areas such as turbulence modeling and weather/climate prediction,” the Rice University researchers wrote.
The researchers discovered that the shallowest convolution layers are the best to retrain and that this finding is consistent with their physics-guided framework. These pioneering scientists have opened a new window for the pursuit of explainable AI for complex systems in science, engineering, and beyond.
Copyright © 2023 Cami Rosso All rights reserved.