Summary: learning “by example” isn’t explainable, learning “by model” is very explainable. ChatGPT is “by example”.
When you were a kid did you learn to whistle? Or learn to ride a bike? Or maybe some other skill that you figured out that not everyone knows?
When I was little I wanted to whistle. My parents explained how they did it, but it became frustrating for us. Determined, I spent almost an hour alone in my basement (probably so no one could hear my failed attempts!) and kept trying. Eventually I made the correct lip shape and blew air just right. I did it!
I smiled and lost the correct shape. It took me a few minutes of trying again before my second whistle. But now I had it and I could reliably whistle. I learned how to whistle!
This kind of learning is “by example” or “by trial and error.” No one can teach whistling or riding a bike, these types of skills just need to be figured out. Not only can’t they be taught, but no one can really explain how they learned the skill. “I just kept trying till I figured it out”.
Big Data, deep machine learning, Neural Networks and LLMs all work this same way.
Give a neural network a bunch of pictures of cats, and a bunch of pictures of “not cats.” After a sufficient amount of trial and error (“learning”), it will be able to identify pictures of cats. This neural network can’t “explain” how it figures out the next picture contains a cat. The data scientists that developed and trained it can’t either. They could look under the hood and see that neuron X has a value (“weight”) of 0.748, and that neuron Y has a value of -1.374, but this data doesn’t explain anything! Those weights are the result of the training. The neurons together give us useful answers, but no explanations.
Approaches that work “by example” or “trial and error” are not explainable, by either human or machine. The AI term is “interpretability.” Iit means the same thing–can you describe how you did it?
Figure 1: A neural network pipeline from google [1]
LLMs like ChatGPTs are not explainable, even by the data scientists and companies that create them. While research is trying to create more interpretability the fact is by their nature these technologies are more like whistling and can’t be explained.
Other skills are learnable and explainable.
Following a recipe to bake a cake, or learning how to chop safely with a chef knife by tucking your fingers under. These skills can be taught, and they can be explained and understood. These have “a process” that can be taught, usually there are “experts” that know good or even best ways.
Approaches that have an “expert process” are very explainable because they have a “model” for how and why things work. Simple processes can just be a flow chart or computer program, and complicated ones can be a probabilistic Bayesian/Causal network. These programs and AI technologies are interpretable and how they get an answer can be entirely explained.
Figure 2: A Bayesian Network published on Forecasting Sales [2]
The two ways “learning” can happen:
Both are useful but different. They can be combined to create a blended system with the best of both and significantly more explainable. LLMs can be used to expect details from document and those fed into a model for probabilistic explainable decisions.
References
[2] https://arxiv.org/pdf/2112.08706