Summary of Machine Learning Theoretical Basis of Alchemy

Source: Datawhale

Machine learning is developing rapidly, but the understanding of theoretical knowledge cannot keep up? This article will give reflections from a data scientist who uses the utility matrix to sort out the relationship between the experimental results of the model and the underlying theory, and explore the progress of various subfields of machine learning.

introduce

Know it, know why.

The field of machine learning has developed very rapidly in recent years, but our understanding of machine learning theory is still very limited, and the experimental effects of some models even exceed our understanding of the basic theory.

At present, more and more researchers in the field begin to pay attention to and reflect on this issue. Recently, a data scientist named Aidan Cooper wrote a blog to sort out the relationship between the experimental results of the model and the underlying theory.

Original link: https://www.aidancooper.co.uk/utility-vs-understanding/?continueFlag=b96fa8ed72dfc82b777e51b7e954c7dc

Original blog post

In the field of machine learning, some models are very effective, but we are not entirely sure why. Conversely, some relatively well-understood research areas have limited applicability in practice. This paper explores advances in various subfields based on the utility and theoretical understanding of machine learning.

"Experimental utility" here is a composite measure that considers the breadth of applicability of a method, how easy it is to implement, and most importantly, how useful it is in the real world. Some methods are not only practical, but also widely applicable; others, although powerful, are limited to specific domains. Methods that are reliable, predictable, and free of major flaws are considered to have higher utility.

The so-called theoretical understanding is to consider the interpretability of the model method , that is, what is the relationship between the input and output, how can the expected results be obtained, what is the internal mechanism of this method, and consider the depth and completeness of the literature involved in the method .

Methods with low theoretical understanding often use heuristics or extensive trial-and-error in their implementation; methods with high theoretical understanding tend to have formulaic implementations with strong theoretical foundations and predictable results. Simpler methods such as linear regression have lower theoretical upper bounds, while more complex methods such as deep learning have higher theoretical upper bounds. When it comes to the depth and completeness of the literature in a field, the field is evaluated against the theoretical upper bounds assumed by the field, which relies in part on intuition.

We can construct the utility matrix into four quadrants, with the intersection of the axes representing a hypothetical field of reference, with average understanding and average utility. This approach allows us to interpret domains in a qualitative way according to the quadrant they are in, as shown in the figure below, domains in a given quadrant may have some or all of the features corresponding to that quadrant.

In general, we expect utility and understanding to be loosely related, making methods with a high degree of theoretical understanding more useful than those with a low degree of understanding. This means that most fields should be in the lower left quadrant or the upper right quadrant. Away from the lower left - the upper right diagonal field represents the exception. In general, practical utility should lag behind theory because it takes time to translate nascent research theory into practical application. So that diagonal line should be above the origin, not directly through it.

The field of machine learning in 2022

Not all fields in the above diagram are fully covered in machine learning (ML), but they can all be applied in the context of or closely related to ML. Many of the evaluated domains overlap and cannot be clearly described: advanced methods of reinforcement learning, federated learning, and graph ML are often based on deep learning. Therefore, I consider the non-deep learning aspects of their theoretical and practical utility.

Upper right quadrant: high understanding, high utility

Linear regression is a simple, easy-to-understand, and efficient method. Although often underestimated and overlooked, its breadth of use and thorough theoretical underpinnings place it in the upper-right corner of the graph.

Traditional machine learning has developed into a highly theoretically understood and practical field. Sophisticated ML algorithms, such as gradient boosted decision trees (GBDTs), have been shown to generally outperform linear regression in some complex prediction tasks. This is certainly the case with the big data problem. Arguably, there are still holes in the theoretical understanding of overparameterized models, but implementing machine learning is a delicate methodological process, and when done well, models can perform reliably within the industry.

However, the extra complexity and flexibility does lead to some bugs, which is why I put machine learning on the left side of linear regression. In general, supervised machine learning is more refined and impactful than its unsupervised counterpart, but both approaches effectively address different problem spaces.

Bayesian methods have a cult of practitioners who tout their superiority over the more popular classical statistical methods. Bayesian models are particularly useful in certain situations: when point estimates alone are not enough, estimates of uncertainty are important; when data is limited or highly missing; and when you understand the data-generating process that you want to explicitly include in your model Time.

The usefulness of Bayesian models is limited by the fact that for many problems, point estimates are good enough and people just default to non-Bayesian methods. More importantly, there are ways to quantify uncertainty in traditional ML (they are only rarely used). Often, it is easier to simply apply ML algorithms to the data, without having to consider data generation mechanisms and priors. Bayesian models are also computationally expensive, and would be more practical if theoretical advances lead to better sampling and approximation methods.

Lower right quadrant: low understanding, high utility

Contrary to progress in most fields, deep learning has had some amazing successes, although the theoretical side has proven fundamentally difficult to make progress. Deep learning embodies many of the characteristics of a lesser-known approach: models are unstable, difficult to build reliably, configure based on weak heuristics, and produce unpredictable results. Suspicious practices such as random seed "tuning" are very common, and the mechanics of working models are difficult to explain. However, deep learning continues to advance and achieve superhuman performance levels in fields such as computer vision and natural language processing, opening up a world filled with otherwise incomprehensible tasks such as autonomous driving.

Hypothetically, general AI will occupy the bottom right corner, because by definition, superintelligence is beyond human comprehension and can be used to solve any problem. Currently, it is only included as a thought experiment.

A qualitative description of each quadrant. Fields can be described by some or all of their descriptions in their corresponding areas

Upper left quadrant: high understanding, low utility

Most forms of causal inference are not machine learning, but sometimes are, and are always of interest to predictive models. Causality can be divided into randomized controlled trials (RCTs) and more sophisticated methods of causal inference, which attempt to measure causality from observational data. RCTs are simple in theory and give rigorous results, but are often expensive and impractical - if not impossible - to perform in the real world and thus have limited utility. Causal inference methods essentially mimic RCTs without doing anything, which makes them much less difficult to perform, but has many limitations and pitfalls that can invalidate the results. Overall, causality remains a frustrating quest, where current approaches often fail to answer the questions we want to ask, unless they can be explored by randomized controlled trials, or they happen to fit into certain frameworks (e.g., as an accidental result of a "natural experiment").

Federated Learning (FL) is a cool concept that has received little attention - probably because its most high-profile applications need to be distributed to a large number of smartphone devices, so FL is only really studied by two players: Apple and Google. Other use cases for FL exist, such as pooling proprietary datasets, but there are political and logistical challenges in coordinating these initiatives, limiting their utility in practice. Nonetheless, for what sounds like a fancy concept (roughly summed up as: "Bring the model to the data, not the data to the model"), FL is valid, and has applications in areas such as keyboard text prediction and personalized news recommendation Practical success stories. The basic theory and techniques behind FL appear to be sufficient for FL to be more widely used.

Reinforcement learning (RL) has achieved unprecedented levels of competence in games such as chess, Go, poker, and DotA. But outside of video games and simulation environments, reinforcement learning has yet to translate convincingly to real-world applications. Robotics was supposed to be the next frontier of RL, but it didn't—the reality seemed more challenging than the highly constrained toy environment. That said, RL's achievements so far are inspiring, and someone who really enjoys chess might think its utility should be higher. I would like to see RL realize some of its potential practical applications before putting it on the right side of the matrix.

Lower left quadrant: low understanding, low utility

Graph Neural Networks (GNNs) are now a very hot field in machine learning, with promising results in multiple fields. But for many of these examples, it is unclear whether GNNs are better than alternatives that use more traditional structured data paired with deep learning architectures. Data that are naturally graph-structured, such as molecules in cheminformatics, seem to have more compelling GNN results (although these are generally not as good as non-graph-related methods). Compared to most fields, there appears to be a large difference between the open source tools used to train GNNs at scale and the in-house tools used in industry, limiting the viability of large GNNs outside these walled gardens. The complexity and breadth of the field suggest a high theoretical upper bound, so GNNs should have room to mature and convincingly demonstrate advantages for certain tasks, which will lead to greater utility. GNNs can also benefit from technological advances, as graphs do not currently fit naturally with existing computing hardware.

Interpretable Machine Learning (IML) is an important and promising field and continues to receive attention. Techniques such as SHAP and LIME have become really useful tools for interrogating ML models. However, due to limited adoption, the utility of existing methods has not yet been fully realized - robust best practices and implementation guidelines have not been established. However, the main current weakness of IML is that it does not address the causal problems that we are really interested in. IML explains how the model makes predictions, but not how the underlying data is causally related to them (although it is often misinterpreted like this). Until significant theoretical progress is made, the legitimate uses of IML are mostly limited to model debugging/monitoring and hypothesis generation.

Quantum Machine Learning (QML) is well beyond my wheelhouse, but currently seems like a hypothetical exercise, patiently waiting for a viable quantum computer to become available. Until then, QML sat trivially in the bottom left corner.

Incremental advancements, technological leaps and paradigm shifts

There are three main mechanisms by which the field traverses the theoretical understanding and empirical utility matrix (Figure 2).

An illustrative example of how fields can traverse a matrix.

Incremental progress is slow and steady progress that moves the field up in inches on the right side of the matrix. Supervised machine learning of the past few decades is a good example, during which time increasingly efficient prediction algorithms were refined and adopted, giving us the powerful toolbox we love today. Incremental progress is the status quo in all mature fields, except for periods of greater movement due to technological leaps and paradigm shifts.

Some fields have seen step-changes in scientific progress due to leaps in technology. The field of deep learning has not been unraveled by its theoretical foundations, which were discovered more than 20 years before the deep learning boom of the 2010s - its renaissance was fueled by parallel processing powered by consumer-grade GPUs. Technological leaps usually appear as jumps to the right along the empirical utility axis. However, not all technology-led advancements are leaps. Deep learning today is characterized by incremental progress by training larger and larger models using more computing power and increasingly specialized hardware.

The ultimate mechanism for scientific progress within this framework is paradigm shift. As Thomas Kuhn points out in his book The Structure of Scientific Revolutions, paradigm shifts represent important changes in the fundamental concepts and experimental practices of scientific disciplines. One such example is the causal framework pioneered by Donald Rubin and Judea Pearl, which elevates the field of causality from randomized controlled trials and traditional statistical analysis to a more powerful mathematical discipline in the form of causal inference. Paradigm shifts often manifest as upward movements in understanding, which may follow or be accompanied by increased utility.

However, the paradigm shift can traverse the matrix in any direction. When neural networks (and subsequently deep neural networks) established themselves as a separate paradigm from traditional ML, this initially corresponded to a decline in utility and understanding. Many emerging fields branched off from more established areas of research in this way.

The Scientific Revolution in Prediction and Deep Learning

In summary, here are some speculative predictions that I think may happen in the future (Table 1). Fields in the upper right quadrant are omitted because they are too mature to see significant progress.

Table 1: Prediction of future progress in several major fields of machine learning.

However, observations that are more important than how individual fields develop are a general trend in empiricism, and a growing willingness to admit comprehensive theoretical understanding.

Historically, theories (hypotheses) generally appear first, and then ideas are formulated. But deep learning has ushered in a new scientific process that turns this on its head. That said, methods are expected to demonstrate state-of-the-art performance long before people focus on theory. Empirical results are king, theory is optional.

This has led to a broad game of systems in machine learning research, resulting in state-of-the-art results by simply modifying existing methods and relying on randomness to surpass baselines, rather than meaningfully advancing the field's theory. But maybe that's the price we're paying for a new wave of machine learning booms.

Figure 3: 3 potential trajectories of deep learning development in 2022.

Is deep learning in an irreversible outcome-oriented process and relegating theoretical understanding to optionality? 2022 could be the turning point. We should consider the following questions:

Will theoretical breakthroughs allow our understanding to catch up with practicality and transform deep learning into a more structured discipline like traditional machine learning?
Is the existing deep learning literature enough for the utility to increase indefinitely, just by scaling larger and larger models?
Or will an empirical breakthrough lead us further down the rabbit hole into a new paradigm of enhanced utility, even though we know less about it?
Do any of these routes lead to artificial general intelligence?

Only time will tell.

The main work of the Future Intelligence Laboratory includes: establishing an AI intelligent system IQ evaluation system, and carrying out the world artificial intelligence IQ evaluation; carrying out the Internet (city) brain research plan, building Internet (city) brain technology and enterprise maps, in order to improve enterprises, industries and industries. Smart level services for cities. A daily recommendation of learning articles covering future technological trends. At present, the online platform has collected thousands of cutting-edge scientific and technological articles and reports.

If you are interested in laboratory research, welcome to join the Future Intelligent Laboratory online platform. Scan the QR code below or click "Read the original text" in the lower left corner of this article