Deep Learning for Scientific Discoveries
By Sunil Ale
Photo made by Sunil Ale using AI website DALL.E
I had the pleasure of attending a seminar titled “Machine Learning for Science” by Dr. Wahid Bhimji. He currently leads a Data and AI Services Team at the National Energy Research Scientific Computing Center at the Lawrence Berkeley National Laboratory, also known as Berkeley Lab. His interdisciplinary experience ranges from research in particle physics to advancements in supercomputing. His team works in supercomputing and related innovations from the national lab.
Dr. Bhimji believes that deep learning (DL) will transform science. Deep learning is a type of machine learning (ML), a subset of Artificial Intelligence. AI attempts to replicate human-like behavior in machines. DL uses layers of artificial neural networks, a digital replication of neuronal layers in human brains, aiming to replicate human cognition.
How far have we come with deep learning? The answer: far enough that scientists are now beginning to consider the application of deep learning seriously.
Apart from the computer science industry, Dr. Bhimji projects an increasing adoption of this technology in scientific research. As science and technology advances with new theories and discoveries, they either beg for more answers or they ask for more logical/mathematical precision and others simply equip us with loose scientific lenses through which we can estimate the output. DL models have existed since the 1980s, but the interest in them reignited because of the culminating sets of compelling questions that began getting answered in the last decade. Convolutional Neural Networks, also known as CNNs, are a type of deep learning model that exceeded the accuracy of traditional image recognition models by 2012. It went on to surpass the accuracy of the human performance in image recognition by 2015. Having solved computational mysteries, researchers have now begun to look for more scientific problems that can be solved using them.
The advancement in computational devices and their widespread availability have made large amounts of curated datasets readily accessible. Modernized technologies that include specialized sensing devices for collecting images, temperatures, sounds etc., augmented by the high-speed connectivity, have made more datasets available than today’s scientists are able to organize and make sense out of. There also have been algorithmic advancements in areas that include representation, optimization, normalization, regularization, and more. The progress in CNNs have helped computers get better at image recognition than humans, while the advancements in transformer algorithms, specifically generative pre-trained transformers (GPT), has provided native level linguistic and conversational capability to the computers. ChatGPT and Bard AI are just a few examples that have dominated the technology news today.
The power that computers have today allows us to collect, process, and produce a huge amount of data and information. In general, more data is better for deep learning training.
ML and DL have the potential to transform science by making analysis of large scientific datasets easier. During large scale experiments, the system might produce a much larger amount of data than is possible for humans to analyze. That raw data might be used as input data to the machine learning models. If unsupervised learning is used, there is usually no need to rely on a particular scientific theory, but it has the potential to result in new ones. For other cases where we understand the scientific underpinnings very well, we might be able to constrain the deep learning models, for example, when using CNNs to analyze images and later predict contents in images. In addition, ML can accelerate computationally expensive simulations. Current simulations are made using very large supercomputers and require extensive amounts of energy and resources. DL methods can accelerate the amount of time required. Lastly, the control and design of experiments can be automated. At one point in the future, we can imagine running an automated lab such as self-driving cars: “self-driving” labs in the words of Dr. Bhimji.
Today, tons of images and data are collected from cosmological observations. There are particle accelerators where we are trying to create new particles and understand the universe, as well as experiments exploring the prospect of harnessing energy from nuclear fusion and fission reactions. Somewhere during cosmological observations or particle acceleration, there’s the possibility to learn more about dark matter as well. For energy, there’s the need to create or discover new catalysts to increase the efficiency of batteries. Traditional weather forecasting systems cannot predict weather past the upcoming ten days. In all these instances, we are collecting tons and tons of data. So much data is being produced that in the Large Hadron Collider, 99% of the data are thrown out in real time. There is a possibility that deep learning might help us make sense of all this data.
Following in the footsteps of the computer industry level trend, National Laboratories in the US are also slowly incorporating DL.The availability of large, curated datasets, the advancements in algorithms and the increase in computing power has made the incorporation of DL in scientific research more feasible now. So, will DL augment all of these quests for answers? The answer is probably yes, but we just have yet to figure out how.
Dr. Bhimji’s Presentation Link: Machine Learning for Science SHI Oct 2023 - Google Slides
Internship Opportunities at National Labs
https://education.lbl.gov/internships/
https://www.nersc.gov/research-and-development/internships/
https://www.anl.gov/education/undergraduate-programs
https://www.anl.gov/education/graduate-programs
https://www.bnl.gov/education/
https://www.lanl.gov/careers/career-options/student-internships/undergraduate/
https://www.lanl.gov/careers/career-options/student-internships/graduate/
https://education.lbl.gov/internships/
https://www.llnl.gov/join-our-team/careers/students
https://education.ornl.gov/undergraduate/
https://education.ornl.gov/graduate/
https://www.pnnl.gov/stem-internships
https://www.sandia.gov/careers/career-possibilities/students-and-postdocs/