AI beyond smoke and mirrors

Information is not a disembodied abstract entity; it is always tied to a physical representation. It is represented by engraving on a stone tablet, a spin, a charge, a hole in a punched card, a mark on paper, or some other equivalent. This ties the handling of information to all the possibilities and restrictions of our real physical word, its laws of physics and its storehouse of available parts.
Landauer, 1996, The physical nature of information, in: Physics Letters A, p. 188

In [Gordon] Teal’s system, creating the material was creating the advanced electronic device…in the junction transistor the material was the device, no more no less.
Christopher Lecuyer and David C. Brock, 2006, The materiality of microelectronics, in: History and Technology, pp. 304 and 307

Because of its very concreteness, people tend to confront technology as an irreducible brute fact…rather than as hardened history, frozen fragments of human and social endeavor.
David F. Noble, 2011, Forces of Production: A Social History of Industrial Automation, p. XIII

Summary

In AI, plenty of effort is spent making smoke and mirrors – and plenty of effort is spent debunking them. Few, however, look beyond to describe what is left once the smoke and mirrors have been cleared away. Yet, ‘what is left’ turns out to be the most interesting part.

Concretely, my goal is to examine innovative directions in AI that could produce notable results. Potential options will be assessed on a rolling basis based on in-depth reading of recent scientific literature, including but not restricted to the various ‘non-von Neumann’ propositions and trying to bracket a range of TRLs.

The first under scrutiny – structure and sufficiency of data; wetware; and neuromorphic hardware. I also touch briefly on thermal management of CPU GPU but I leave further exploration of that interesting additional topic to another day.

In each case, I look at the recent history as well as breaking down the devices and scientific knowledge upon which they are based. This is a working document so I tend to update and correct it when I have a moment (at this juncture it is an extremely rough draft because I ran out of time).

The structure of data

Our analysis suggests that, if rapid growth in dataset sizes continues, models will utilise the full supply of public human text data at some point between 2026 and 2032, or one or two years earlier if frontier models are overtrained. At this point, the availability of public human text data may become a limiting factor in further scaling of language models.
Villalobos, et al., 2024, Will we run out of data? Limits of LLM scaling based on human-generated data (arXiv)

It has been said that data is the new coal. This metaphor can be read in a number of ways. Data, like coal, is not one thing, but a whole technical world of its own.

Evidently, we need to understand its scale, its qualities, how to get more of it, how much is sufficient, and its applicability to a given problem – any technique that moves us towards these goals has to be worth study.

Such a perspective reveals a relatively narrow landing zone for commercial applications (while also, of course, implying potential areas for the creation of new kinds of data).

The is no shortage of articles noting the insufficiency of data. A detailed argument was recently made by Villalobos, et al. (2024) – AI will run out of training sets roughly over the next decade (even when all available data has been ‘cleaned’ and so on). The solutions proposed by these authors are as follows:

Hiring 10 million people to write 40 words per minute for 8 hours per day; thereby writing 70T words in one year, at a cost of hundreds of billions of dollars in wages.
Synthetic data, i.e., using models themselves to generate more data.
Multimodality and transfer learning, which involves training language models on other existing datasets (e.g. from different domains),
Non-indexed data from social media and instant messaging platforms.
Interactions with the real world.
Sensory observations or from the results of real-world experiments.

The hunger for data to train AI is evident – as is the volume of capital available to achieve that goal.

The proposal to employ millions of writers has merit and would at least be equitable. Large numbers of people are already paid to create content for the web. But we also observe people writing content for free such as through social media.

There is something interesting about the scale of that data as it changes through time.

Makers of data storage media assure us that demand for their products will rise. However, it would seem that we have less than expected knowledge concerning the scale of data or the trajectory of that scale.

Logically, as access to the internet increases, so the amount of data increases, while also becoming generally more representative of the global population.

However, we could also envisage phenomena over the longer term such as link rot wiping out gains or even that the volume of data will decline as individuals lose interest in contributing to social media (despite the efforts of the major companies to keep them engaged with it).

Villalobos, et al., argued that ‘the size of Google’s index has remained relatively constant over the past decade, which is a counterintuitive result since new web pages are regularly created.’

The reasons proposed for this unexpected finding:

If the rate of growth of the web is similar to the rate of link rot, the two effects might partially cancel each other and lead to a more stable index size.
The possibility that Google is keeping the size of their index within a fixed range due to economic or engineering considerations.
The data are not representative examples of the global web, but only a part of it. Since most of the recent growth in the number of internet users has been in non-Western countries, it is possible that the estimates are missing this growth.

When we examine the structure of data, we find the majority is of three kinds, namely, astronomical (images of space), genomic and social media.

Therefore, it would seem that commercial proposals for AI outside two use cases, namely, prediction of phenotype from genotype, and online behavior (language, image and video), need to be treated with circumspection – unless accompanied by significant acquisition of new data.

This is not only because there is not enough data to train AI, but because the data that already exists in other areas could probably be analyzed by conventional methods and, indeed, might have been already analyzed.

Not enough data

‘AI methodologies have obtained promising outcomes in processing scientific data, resulting in an almost eight-fold increase in AI publications in biology only since 2000…However, this growth of AI applications has perplexingly not seen a comparable advance in our understanding of the field.’*

AI has been proposed as a means of scientific discovery for many decades and indeed such long-forgotten tools as Arrowsmith were actually used for discovery purposes (but seemingly never found wide application). Whether DeepMind is the new Arrowsmith, or more, or less, significant, is not yet clarified.

In high-resource R&D settings, such as in chemical and pharmaceutical multinationals, robotic generation of training data seems the solution – by analogy to genomics where high-throughput sequencing became the norm through the US Department of Energy-initiated Human Genome Project (although of course genomics was not undertaken with a view to training AI).

This also raises the related problem of designing systems that move through the discovery space – so-called self driving lab. Evidently, that is a substantial area of research interest but perhaps it is still early days.

How to evaluate such research? A well-known study proposed a robot that ‘operated autonomously over eight days, performing 688 experiments’

Perhaps this robot could generate the very large datasets needed to train AI.

Impressive as it sounds, it is useful to compare this performance with a discovery bureaucracy such as of the kind deployed in a mid-20thC chemical company.

In such a company, hundreds of staff, a global network of field stations, communication by telex, etc., were all deployed, with the capacity to field test a hundred compounds per year, citing a report from ICI (so-called ‘blunderbuss research’).†

It would seem 688 experiments could be within the reach of such a bureaucracy. A notion that the robotic lab is not actually extending human capacity as much as claimed.

It could simply prove a version of the scientific bureaucracy that had already explored the available discovery space – recalling also that IT is already heavily integrated into drug discovery. Therefore, the marginal gain from such a complex and expensive AI system might be quite limited.

While a capitalist could seek to neutralize labor as a factor in discovery by means of robots, strictly speaking, it might not make commercial sense when the same could be achieved more cheaply with human beings and, indeed, had already been achieved decades before.

Proposals are made for use of natural products data sets as a basis for AI screening for useful medicines. However, these databases contain only thousands of records – within the capacity of a team of people to screen and not obviously in need of AI.

The possible value of AI in these cases would be in low resource settings, i.e., systems that reduce labor costs to zero and that could thereby generate vital leads for new medicines.

Cloud-based GPU facilities are too expensive and raise significant IP issues. Hacking e-waste springs to mind as one part of a low-cost solution.

*Barberis, et al., 2024, Robustness and reproducibility for AI learning in biomedical sciences: RENOIR, in: Scientific Reports; †Peacock, 1978, Jealott’s Hill : fifty years of agricultural research, 1928-1978, pp. 112-118, 148.

Wetware

If AGI [artificial general intelligence] is ever created, it will come out of a biological laboratory.
Jaeger, 2024, Artificial Intelligence is Algorithmic Mimicry: Why artificial “agents” are not (and won’t be) proper agents (ArXiv)

The philosopher, Johannes Jaeger, makes the point that silicon-based computers cannot deliver true artificial intelligence; at best they will achieve ‘algorithmic mimicry’. The enormous investments being made in hardware to power AI, as currently understood, will therefore fail judged according to that goal. Such ideas lead us back to the ‘biological laboratory’ rather than the chip fab.

Comparison of ANN and brain

	ANN	Brain
Processing time	Faster	Slower
Refractory period	No	Yes
Processing	Serial	Parallel
Network architecture	Designed	Evolved
Ambiguity of incoming data	Intolerant	Tolerant
Activation profile	Sigmoidal	Input strength tuned, slow
Energy consumption	250W*	20W
Heat production	50-80°C	37°C
Scale	100s-1000s ‘neurons’	86 billion neurons (100 trillion synapses)
Physical basis	Transistor	Neuron, synapse
Learning	Programmed	Autonomous
Memory	Bits	Synaptic strength
Scale of memory	175 trillion GB (Internet)†	5 trillion GB (humanity)‡

*GeForce Titan X GPU. †Reinsel, et al., 2018, p. 7. ‡Roughly estimated based on 4.7 bits of information per synapse proposed by Bartel Jr, et al., 2015, 100 trillion synapses per human brain, and 8 billion human brains. Sources: Gebicke-Haerter, 2023, The computational power of the human brain, in: Frontiers in Human Neuroscience; Reinsel, et al., 2018, The Digitization of the World From Edge to Core (IDC); Bartel Jr, et al., 2015, Nanoconnectomic upper bound on the variability of synaptic plasticity, in: eLife

Carver Mead made a distinction between conventional computer systems based on isolating signals through the silicon/silicon dioxide band gap – as compared to the nervous systems of living beings, in which that role was fulfilled (in his terminology) through differences in the dielectric constant between lipids and water.

Let us, therefore, evaluate one recent example of the aqueous-lipid system that received significant levels of publicity, DishBrain. This system was reported to have learned to play an old Atari game. It was a kind of biological homage to the 2013 DeepMind project, which had similar ambitions, albeit the former project was delivered with a deep-learning model on a computer.

The DishBrain concept was a conjunction of several existing devices that can be traced back in terms of scientific history through the 20thC. It seems this kind of wetware system implies four main generic components.

The four main components of wetware (DishBrain study)

Component	Description
The task	Playing an Atari computer game that once took the arcade by storm
Computer-brain interface	Microelectrode array (CMOS MEA)*
Control system (software)	DishServer (modified version of proprietary CMOS MEA software)
Brain (wetware)	1×10⁵ NGN2 induced neurons and 2.5×10⁴ primary human astrocytes

*The complementary metal-oxide-semiconductor microelectrode array (CMOS MEA) is commercially available and builds on older technologies such as MEA, dating from the 1970s and developed at Harvard Medical School, and CMOS MEA, dating from the early 2000s, and associated with ETH Zurich.

The crucial component – the interface between the living neurons and the computer. It was provided by a High-Density Microelectrode Array (HD-MEA) plate manufactured by a Swiss firm, Maxwell Biosystems and a tailored control system based on software developed by the same company. The computation was provided by neurons and human astrocytes in each well of the HD-MEA plate.

Experiments are rarely, if at all, independently replicated in contemporary science. It is therefore impossible to evaluate the claims in full. The best that can be achieved is textual exegesis (albeit, inconclusive, as the description of experimental methods is ambiguous). In short, we cannot draw any iron-clad conclusions, but the discussion might be enlightening.

The DishBrain report seems to have generated consternation in the scientific community; two technical criticisms were noted. Regrettably, these were not explored by critics but they warrant a deep dive as regards our goal of assessing the future salience of the technology. Criticisms made by Balci, et al. (2023), were, specifically:

Weak results, some of which fail to adequately match control and experimental conditions.
Failure to acknowledge previous use of biological neural networks embedded in closed-loop systems that has helped, for example, to assess the potential application of plasticity to drive external artifacts, e.g., robots.

The first point is related to control and experimental conditions. The control conditions appear to have been (1) media-only controls (presumably, no cells); (2) rest sessions, where cell cultures controlled the paddle but received no sensory feed-back (and therefore, one assumes, could not learn how to play the game); and (3) in-silico controls that ‘mimicked all aspects of the gameplay except the paddle were driven by random noise over 399 test sessions’.

At another point in the paper, the authors also seemed to refer to a control of Human Embryonic Kidney Cells 293T that are not electrically active. They only report, however, on the response of a ‘media-only’ control (as far as I could see). This seems an important point because if it were revealed that non-electrically active cells such as the kidney cells were also capable of improving game-playing performance, it would cast doubt on the validity of the assay (or on the validity of assumptions about the electrical inactivity of those cells).

It seems to me the most obviously unequivocal control was therefore the cell culture that did not receive sensory feedback. The authors of the study argued that on average performance the treatments containing cells showed significant improvement and higher average rally length against control groups over time. Therefore, learning must have occurred.

Stepping back from a single paper and assessing the state of current knowledge, we have to distinguish between the review and op-ed articles, such as Dixon, et al. (2021), and the much smaller number of scientific advances reported in primary literature that stand up to scrutiny.

Part of the reason for a lack of notable use cases lies with the unfocused rationale for the work – ranging from therapeutics to biosensors for pollution. This is not a single field, but a range of approaches, disciplines and goals that are not always connected and, overall, lack a shared ethos and set of methods. The infrastructure and concentrations of capital are simply less developed than has been the case for silicon chips.

Whether or not AI would provide the thread to stitch these various scientific ideas together is impossible to say. Generally speaking, however, the idea has not been to use neurons as computational devices, which is the crucial proposition made by DishBrain but, indeed, the opposite, to control neurons with a view to achieving a physical task.

The development of so-called cyborg animals such as cockroaches calls to mind the idea of using the learning capacity of cockroach brains (inside living cockroaches) connected in parallel, rather than cells in dishes. This would dispense with the need for cell culture – but currently remains science fiction.

It implies the use of semiconductor-based AI to learn the language of cockroach brains, then feeding that back, thereby deploying a combination of living beings and computers as a unique problem solving tool.

Neuromorphic hardware

Biological information-processing systems operate on completely different principles from those with which most engineers are familiar…[yet] [t]here is nothing that is done in the nervous system that we cannot emulate with electronics if we understand the principles of neural information processing.
Carver Mead, 1990, Neuromorphic electronic systems, in: Proceedings of the IEEE, pP. 1629-1630

Neuromorphic hardware is not a new idea; the term is associated with Carver Mead around 1990. It seeks to mimic some aspects of actual brains.

Its main advantage could be energy-saving potential, at least in theory. This could be attributed to its ability to colocate processing and memory as well as event-driven and parallel computing. Thus far, it has only been implemented in taxpayer-funded initiatives such as TrueNorth, NorthPole and SpiNNaker.

The NorthPole chip, developed by IBM Research under a DARPA contract, offered substantially inferior scale to a human brain, counting only 64 million ‘neurons’.

It was claimed to display two major advantages over conventional competitors such as Nvidia. According to the authors, this implied an ability to make a smaller, portable, consumer device capable of machine learning. There is, however, also scepticism about the promise of such chips.

The collocation of processing and memory helps mitigate the von Neumann bottleneck regarding the processor/memory separation, which causes a slowdown in the maximum throughput that can be achieved. In addition, this collocation helps avoid data accesses from main memory, as in conventional computing systems, which consume a considerable amount of energy compared with the compute energy
Schuman, et al., 2022, Opportunities for neuromorphic computing algorithms and applications, in: Nature Computational Science

Given the central claim concerning its energy and space-saving potential, we would have to determine if these properties were even in principle sought in the industry. If they were not, that might explain why the technology has never been adopted on a large scale and, indeed, would never be adopted.

Since the invention of electronic computers in the 1940s, the number of computations per kWh has indeed increased, in what is sometimes known as Koomey’s law (by analogy to Moore’s law). Koomey, et al. (2009) calculated that the increase in efficiency, so to speak, was exponential between 1946-2009, with an average doubling time of 1.57 years.

However, what is crucial in my view is not the rate but the reason why this increase occurred. Koomey, et al. attributed it to chip miniaturization.

It was not, therefore, as far as we can tell, due to engineering choices made by senior managers in the industry, with the intention of reducing energy use, but rather an outcome of a commitment to Moore’s law (the strategy of ‘More Moore’). The word efficiency can have several meanings and these are not always straightforward.

Indeed, despite recently raised levels of discussion concerning energy efficiency, evidently, this did not translate to heightened rates of such efficiency gain. A later analysis, covering the years 2008-2023 by Prieto, et al., 2024, argued for a slowing in the doubling rate (2.29 years).

It needs to be pointed out that the theoretical potential for efficiency remains stupendously large, citing the thermodynamic possibilities of the Landauer limit (or, at least, we are a long way from exhausting the possibilities).

While the commitment to Moore’s law reduced the amount of energy per unit of computation, cramming transistors into a smaller space generated greater amounts of heat at junctions – a mix of outcomes in what amounts to a complicated overall situation.

Power density

‘Power density of the microprocessors has surpassed that of an ordinary kitchen hot plate. If this trend continues, it will not be long before microprocessors will have power densities comparable to that of nuclear reactors and rocket nozzles.’*

The problem of heat became particularly marked towards the end of the last century as power densities rose above 10 W per square cm.

Although ceramic semiconductors can tolerate extreme heat, adjacent components are vulnerable. It was indeed found that even small increases in temperature reduced the reliability of devices.

The history of R&D on cooling down semiconductors is not a topic I looked into, but it would seem the American military and computer firms such as IBM were (as expected) crucial early players.

The answer lay with gimcrack combinations of fans, heat-pipes and thermal interface materials (e.g., ‘thermal paste’ containing silicone, polyethylene glycol, etc.).

A succinct and well-written article by one of the leading scientists in the field, Prof. D.D.L. Chung, gives a deep dive concerning thermal interface materials and the problems associated with evaluating them scientifically.

*Arman Vassighi and Manoj Sachdev, 2006, Thermal and Power Management of Integrated Circuits, p. 8

Another factor in this story lies with the size and location of computational devices. One history of the industry deems the years 1980-1995 as the age of personal computers, whereby units were increasingly found inside small boxes located in homes and businesses.

In contrast, the ages before and after were dominated by large centralized facilities, once known as mainframes and, more recently, as data centers. The contemporary, networked, personal computer (including, obviously, smart phones and so on) fundamentally relies on data centers in a way that the ‘discrete’ personal computer in, say, 1990 did not (before the rise of the internet).

The tendency to see the current epoch as one of energy crises whereby the major concentrations of capital will seek new sources of energy to power data centers (until these sources are exhausted), while also seeking efficiency gains, might be true.

Yet, in a world of large data centers and practically unlimited capital among the major players, it is difficult to see why either efficiency or indeed miniaturization would be paramount. Marginal cuts in opex through neuromorphic hardware would be perhaps overwhelmed by high capex, as compared to inexpensive graphics cards that currently fulfill their needs.

Physical space might be a limiting factor within the current footprint of data centers. Yet the hold of the major IT companies over governments means planning and energy policies would become subordinate and, therefore, it would surely be possible to build more data centres.

The problem is that it is difficult to make solid judgements about the nature of current chips and, furthermore, how much efficiency matters to senior managers among the range of other metrics that drive their decisions. It could be the case that computational requirements are adequately met by large numbers of otherwise outdated components, noting the importance in machine learning of graphics cards that emerged from video gaming.

Certainly, discussion of a Google chip program did not seem to prioritize energy efficiency; rather, decisions to upgrade equipment were said to be made in light of expected processing demand. In my view, if a major company with almost unlimited capital is not prioritizing a metric we need to take that evidence seriously when it comes to assessing the salience of proposed innovations.

Without doubt, these calls are good for makers of graphics cards and associated components like cooling fans (innovation in CPU GPU cooling will likely be put under the microscope later in this project).

But it is much less clear if the hunger for such cards speaks to the full gamut of possible future developments. Therefore, investors must analyze the probable opportunity costs of acquiring large amounts of GPU equipment that will later come to be seen as obsolete.

Energy and labor – provisional notes

Steam, Babbage, Marx:

AI is often compared to steam engines in reference to the industrial revolution. The metaphor opens up interesting lines of thought.

The fact that Manchester cotton – output of the steam-powered automatic looms – connected to slavery in the Deep South is evidently central in the historical case but not acknowledged in the metaphor.

Karl Marx attributed the success of the steam engine to its ability to save on wage costs (following the shortening of the working day through legislation). Other analysts, to the contrary, emphasized the importance of an energy crisis in which capitalists adopted steam power as a way to economize on fuel.

Between such labor-saving and energy-saving qualities, how to understand what is occurring in the contemporary world of AI? This, to my mind, is an important question. It would allow us to understand which innovations are likely to be wanted and which will fall by the wayside.

If we adopt Marx’s view, it would be the labor-saving innovations that win out, whereas if we tend more to thinking of energy efficiency as the driving force, a different position might beckon.

Marx identified improvements in piston speed, economy of power (thereby reducing coal consumption), lessening of friction in the transmitting mechanism, and reduction in diameter and weight of shafting.

Cuts in the weight of power looms, simultaneous to an increase in their complexity by ‘imperceptible alterations of detail’ were also crucial, such that the speed of self-acting mules increased by one fifth over a decade.†

This mix of technical details, if we try to translate it to our present predicament, imply a complicated mix of causes and effects. Situational awareness is, without doubt, very hard to get.

The fact there are even differing views on why particular devices succeeded such as steam engines and power looms – when all available data have long-since been gathered – suggests little hope for ex-ante assessments concerning AI.

That being said, I think a discussion framed by energy-saving and labor-saving qualities is intellectually enlightening even if no clear story emerges from it.

The labor-saving qualities refer to the ability of the device to substitute for labor at a lower cost. Ultimately, this could be pursued to completely obliterate the labor theory of value.

However, in the real world, so to speak, the question of the relative costs of employing humans as opposed to the costs of a putative automated (AI) substitute has to be taken into account at each juncture as well as the specifics of political economy.

It is interesting to reflect on Babbage’s famous mechanical computer, designed in the 19thC to calculate tables, citing the historian of science, Schaffer. Perhaps the overall point in present circumstances – AI might be invoked in a program of restructuring of work.

†Cited in Von Tunzelmann, 1978, Steam Power and British industrialization to 1860, pp. 217-218. The context of the Von Tunzelmann analysis is presented by Smith and Bruland, 2013, Assessing the role of steam power in the first industrial revolution: the early work of Nick von Tunzelmann, in: Research Policy. The paper offers a gateway to a substantial economic and historical literature that harbors different perspectives.

The problem of heat:

While it is of course possible to run AI on a CPU, the current focus is on using GPUs housed in large racks in data centers. Indeed, the chips are the core device as far as everyone seems to believe (not the software).

As an example of that focus on a particular device, policymakers are interested in regulating the movement of silicon chips across national borders but not the esoteric knowledge needed to program them (which is freely available on the internet).

Semiconductors are at the core of any digital device and the Union’s digital transition: from smartphones and cars, through critical applications and infrastructures in health, energy, communications and automation to most other industry sectors. As semiconductors are central to the digital economy, they are powerful enablers for the sustainability and green transition
Regulation (EU) 2023/1781 of the European Parliament and of the Council of 13 September 2023

Some of the main devices for data centers include GPU, fans and other cooling systems, aluminum data centre racks (to hold the units), and cables. I want to look at what has been written about the energy economy of this group of interrelated devices.

It is evident that the people who pay for the electricity bills of data centres would be interested in reducing those bills. However, is it much less evident that the manufacturers of silicon chips share this interest – or, rather to say, the story is more complicated.

Data centres use energy to power computations but also to disperse the waste heat generated by those calculations. About 40% of consumed energy is needed for cooling equipment – this has evidently been where a lot of focus has been in terms of cutting energy consumption by innovating the cooling system, rack design, and so on.

At the level of the individual chips, heat is produced by movement of electrons that generate lattice vibrations. This heat is significant – practically all supplied power ends as heat.

In a tightly packed integrated circuit, the heat cannot disperse. The more packed the chip, the more extreme the problem. The issue of thermal stress became a notable technical issue in the last century.

Because the American weapons industry was a dominant player in these developments, keeping chips functional in extreme environments such as a speeding missile also impinged on discussion.

Overall, the issue for the chip is not reducing energy use but simply keeping the circuit functional. Ceramic semiconductors are quite heat resistant, but other components are not, raising risks of failure.

In any case, as the number of components in circuits increased, the chance of failure of individual components started to bite purely as a numbers game.

The answer lay with combinations of fans, heat-pipes and thermal interface materials (e.g., ‘thermal paste’ containing silicone, polyethylene glycol, etc.). However, even these methods had limits because there is only so fast you can get heat away from the chip.

This is why elements like thermal paste are useful because they improve heat conduction such as by reducing the amount of (insulating) air bubbles. Heat is such a concern in chips that sensors are included to monitor it and take action if it gets too high such as slowing down calculations or powering off.

The collapse of Dennard scaling at the start of the present century – whereby power use no longer reliably scaled down with size – compounded the problem of overheating.

Unfortunately, cooling technology does not scale exponentially nearly as easily. As a result, processors went from needing no heat sinks in the 1980s, to moderate-size heat sinks in the 1990s, to today’s monstrous heat sinks, often with one or more dedicated fans to increase airflow over the processor. If these trends were to continue, the next generation of microprocessors would require very exotic cooling solutions, such as dedicated water cooling, that are economically impractical in all but the most expensive systems.
Olukotun and Hammond, 2005, The future of microprocessors, in: QUEUE, p. 29

Prof. Kunle Olukotun innovated the ‘multicore’ by increasing the number of cores per die; the idea was taken up by IBM as a strategy to keep heat under control while sustaining small improvements in single-core efficiency.

Obviously, this had effects on software design required to regulate calculations through multiple cores (threads) rather than, as has been previously usual, executing instructions in sequence.

In the overall context of over-packed (and overheating) CPUs, GPUs (graphics cards), which are much less tightly packed than CPUs, had a particular advantage because they generate less heat.

The problem was that GPUs had a single major commercial use case – computer games. Artificial neural networks supplied them with a second use case, whereby parallel computing was actively sought, and thus revived an industry that was running out of new devices.

It was fortuitous for company executives, because if no further substantial performance gains were possible, computers would have become like domestic appliances or office sundries; evidently, the industry would change significantly as these are useful and continue to sell in numbers, but they do not attract massed capital.

The small-to-negligible improvements in CMOS [complementary metal–oxide–semiconductor] power efficiency still pose a challenge, which may motivate more radical innovations in computer architecture: photonics, quantum computing, analog computing, and biological computing with neurons could emerge as a cross-cutting enabler to address the continuing power challenge…Our collective work has been instrumental in making CPU-based computing, and now AI (currently GPU)-based computing, an integral part of the human experience.
Hadi Esmaeilzadeh, et al., 2023, Retrospective: dark silicon and the end of multicore scaling, in: ISCA@50 25-Year Retrospective: 1996-2020

Topics on the appraisal list:

Software innovation – parallel and sequential computing problems in context of papers such as Vaswani, et al., 2017, Attention is all you need (Google). Intellectual history of computer architecture going back to Flynn, 1966, Very high-speed computing systems, in: Proceedings of the IEEE.

Some of the ideas crucial to AI, such as back-propagation, have mostly forgotten but quite odd origins deep in the cybernetic world of the US military-industrial complex of the 20thC.

Werbos, for example, developed back-propagation as part of a research program to predict political mobilization, funded by DARPA.

That is not to forget there is a subsequent intellectual history in the years leading to the present but to note that the underlying conceptual basis is tenuous.

[T]he creation of new methods of inference could have happened in the early 1970s: all the necessary elements of the theory and the SVM [support vector machine] algorithm were known. [Yet] it took 25 years to reach this determination.
Vapnik, 1999, Introduction: four periods in the research of the learning problem, in: The Nature of Statistical Learning Theory, 2nd edition, p. viii.

Hardware innovation – Farzad Zangeneh-Nejad, et al., 2021, Analogue computing with metamaterials, in: Nature Reviews Materials