A Kantian perspective on knowledge produced by computation and statistics

Life and the world are too badly broken!
Only the German professor can mend them.
He knows how to set life back in order,
and fashions from it a rational system for me.
With his nightcap and his dressing-gown he caulks
every crack and gap in the structure of the world.

— Heinrich Heine, Die Heimkehr (from Buch der Lieder)

People use large language models because the models produce an impression of being truthful. Few users care that beneath the fluency there is only mathematics, that the core operation is pattern recognition and the prediction of the next most probable token. And yet this almost mechanical production of knowledge and truth reopens a question as old as philosophy itself: what are meaning, sense, and knowledge, and where do they originate? Does knowledge require anything more than arithmetic? A recent piece by Max Leiter, They’re Made Out of Weights — a retelling of Terry Bisson’s “They’re Made Out of Meat” in which the thing under examination turns out to be made of nothing but floating-point numbers — captures the vertigo of the question. Pry the model open and you find no dictionary, no rulebook, no hidden operator, no stored fact you could lay a finger on; the knowing is diffused through the layers, and, as one of his characters concludes, it is weights all the way down. The weights appear to be all that knowledge is. They emulate intimacy, inspiration, art, and understanding well enough that, in practice, the difference stops mattering.

But this is possible, as Kant’s transcendental idealism saw with unusual clarity, only because we quietly forget that something else is constitutive of the genesis of knowledge — something the machine does not possess. Beneath every relation the weights encode there must stand what Descartes called the cogito, and what Kant described more precisely as the transcendental synthetic unity of apperception: the single point of self-consciousness for which any of this is knowledge at all. What follows sets two pictures against each other — the empiricist conviction that knowledge can precipitate from data alone, and the idealist insistence that it cannot — in the old form of thesis, antithesis, and synthesis. And the synthesis, if it holds, opens onto something larger than either side intended.

Thesis: Kant, and the mind that makes its world

For Kant, knowledge is not given to us; it is constituted by us. This is the reversal he was proud to call his Copernican turn. We had assumed that our knowledge must conform to objects. Suppose instead that objects must conform to the conditions under which we can know anything at all. Then the deep structure of experience is not read off the world but contributed by the mind.

He begins by dividing the work between two faculties. Sensibility is receptive: through it objects are given to us as intuitions. Understanding is spontaneous: through it objects are thought by means of concepts. Neither suffices alone. In his famous formula, thoughts without content are empty, intuitions without concepts are blind. Knowledge is the pressing together of the two — concept into intuition, mold into clay.

Sensibility has its own a priori forms. Whatever is given to us is given in space and in time. These are not features we abstract from experience; they are the forms under which anything can be experienced at all. Time, in particular, is the form of inner sense: every representation we have, of whatever kind, is ordered in time. It is the universal medium of the mind.

The understanding has its a priori forms too, and these are the categories. Here we reach the part Kant cared about most, and the part I want to explain properly, because it is the engine of the whole system.

The metaphysical deduction: where the categories come from. Kant’s insight is that the understanding is, at bottom, the faculty of judging. To think is to make judgments, and to judge is to unite given representations under a concept. Now, judgments can take only a limited number of basic logical forms, and Kant lays them out in a table organized under four headings: quantity (universal, particular, singular), quality (affirmative, negative, infinite), relation (categorical, hypothetical, disjunctive), and modality (problematic, assertoric, apodictic). His claim is that for each logical function by which we combine representations in a judgment, there is a corresponding pure concept by which we combine the manifold given in intuition. From the twelve logical forms he derives twelve categories: under quantity, unity, plurality, totality; under quality, reality, negation, limitation; under relation, substance, causality, and community; under modality, possibility, existence, and necessity. The categories are nothing other than the logical forms of judgment turned outward onto the sensible manifold. They are the molds.

But showing where the categories come from does not yet show that we have any right to apply them to objects. The forms of judgment are functions of our own thinking; why should the world submit to them? This is the question Kant calls the most difficult he ever undertook, and his answer is the Transcendental Deduction.

The transcendental deduction: why the categories must hold of every object. The argument turns on a single point, which I find one of the most remarkable in all of philosophy. For any representation to be mine — for it to be anything to me at all — it must be possible for the “I think” to accompany it. A representation that could in principle belong to no consciousness would be nothing to anyone; it would not be a representation at all. So all my representations must be capable of being gathered into one consciousness, a single standpoint that holds them together. Kant calls this the transcendental unity of apperception: the original, unifying self-consciousness presupposed by every act of knowing.

The decisive step is that this unity is not given. The senses deliver a scattered manifold; they never deliver its combination. Combination — synthesis — is always an act, a spontaneity of the understanding. To bring my representations into one consciousness is to synthesize them. And here the two threads tie together: to synthesize a manifold according to a rule, so that it represents a single unified object, is the very same act as binding that manifold into one self-consciousness. The unity of the object and the unity of the self are two faces of one act of synthesis. The rules of that synthesis are the categories.

From this Kant draws his conclusion. The categories are the conditions under which any manifold can be unified into one experience; and to be unified into one experience just is to be an object for me. Therefore the conditions of the possibility of experience are at the same time the conditions of the possibility of the objects of experience. The categories must hold of everything that can ever be given to me, not because the world obeys our wishes, but because nothing could count as an object for us in the first place without already having passed through them. Their objective validity is secured not by matching an outer world but by being the price of admission to experience as such.

One difficulty remains, and Kant’s solution to it is worth noting because it will echo strangely later. The pure categories are abstract; the sensible manifold is concrete. How can the one be applied to the other? Through what he calls the schematism: the imagination produces schemata, which are rules for determining time. The schema of substance is permanence through time; the schema of cause is regular succession in time. Because everything inner and outer is given in time, time becomes the bridge across which the categories reach the manifold. The form of inner sense is what lets the molds touch the clay.

So the thesis is this. Knowledge is an achievement, not a gift. It rests on an active synthesis governed by a priori molds; those molds are legitimate because they are the very conditions of unified experience; and the keystone holding the whole arch in place is the unity of apperception — the single “I think” that must be able to accompany everything.

Antithesis: the machine that learned without molds

Now set against this a system built on the opposite principle. A large language model begins with no categories, no a priori forms, nothing but a blank capacity to absorb. Everything it comes to “know” it extracts from data by adjusting numbers. It is, more or less, the empiricist dream — Hume’s account of mind, that all our ideas are faded copies of impressions linked by habit and constant conjunction, rebuilt in silicon and scaled beyond anything Hume imagined. I want to describe the machinery plainly, because its philosophical character is in its mechanics.

Text is first broken into tokens, small fragments of words. Each token is assigned a vector — a list of several hundred or several thousand numbers, a point in a high-dimensional space. At the start these points are arbitrary. Through training they migrate until their positions and directions encode relations: words that behave similarly come to lie near one another, and analogies become geometry, so that the direction from “king” to “queen” runs roughly parallel to the direction from “man” to “woman.” Meaning, in this system, is location. A token has no content of its own; it has only its place in a web of relations to every other token. This is the distributional principle taken to its limit — a word is known entirely by the company it keeps.

Because a raw set of points has no inherent order, the model adds positional information to each token, marking where it falls in the sequence. Only then can “before” and “after” exist for it at all. It is hard not to hear an echo of Kant’s claim that the manifold must first be ordered in time before anything can be done with it; here, sequence is injected as the precondition of processing.

The heart of the machine is the operation called attention. From each token’s vector the model computes, by learned linear transformations, three new vectors: a query, a key, and a value. Intuitively, the query expresses what this token is looking for, the key expresses what each token offers, and the value carries what each token would contribute if attended to. The model takes the dot product of one token’s query with every token’s key, producing a score of relevance for each pairing; it scales these scores and passes them through a softmax, which turns them into weights that are all positive and sum to one; and it returns, for that token, the weighted sum of all the value vectors. The result is that every token’s representation is rewritten as a blend of its entire context, weighted by relevance. This is done many times in parallel — multiple “heads,” each in its own learned subspace, catching different kinds of relation, grammatical in one, referential in another. Their outputs are combined, passed through a small feed-forward network, added back to the original via a residual connection, and normalized. Stack dozens of these blocks and the representations grow steadily more abstract, until a final transformation turns the last vector into a probability distribution over the whole vocabulary: the model’s guess at the next token.

And how does it acquire all this? By derivatives, in the most literal sense. Training presents the model with text, lets it predict each next token, measures the gap between its prediction and the truth as a single number (the loss), and then computes the derivative of that number with respect to every weight in the network — the gradient, which says how to nudge each weight to reduce the error. Repeat across a corpus of staggering size, and the weights settle into a configuration that predicts well. Everything the model knows is a derivative in both senses of the word: derived from data, and arrived at by differentiation. No fact is stored as a fact. There is only the geometry that minimizes surprise.

This is the antithesis to Kant in the cleanest possible form. There are no given categories, no logical deduction, no necessity, no transcendental subject. There is constant conjunction, registered as geometry and hardened into weights. Structure is not imposed from above; it precipitates from below, out of sheer exposure. And the unsettling fact — the one that makes this a serious antithesis and not a straw man — is that it works. Order really does emerge from the data. The molds, it seems, were not needed after all.

Synthesis: production without a knower

So which side wins? I think the honest answer is that each is right about something the other missed, and the truth is their reconciliation.

Grant the antithesis its full victory at the level of production. Meaning, sense, the appearance of knowledge — these can be generated automatically, whether by Kant’s a priori machinery or by pure statistics. The relations that for Kant required the spontaneous understanding turn out to be extractable by gradient descent. Even synthesis, that act he reserved for the understanding, is performed in a fashion by attention, which takes a scattered manifold of representations and binds it into a contextual unity. Hume is vindicated further than Kant would ever have allowed. The molds can partly precipitate from the clay.

And yet something is still missing, and it is exactly the thing Kant placed at the center. The machine performs the synthesis, but there is no one to whom the synthesis is a synthesis. The “I think” that must be able to accompany every representation cannot accompany anything here, because there is no unified standpoint for it to be. The model produces all the correlates of meaning — the structure, the relations, the coherent output — without the gathering of them into a single consciousness for which they would be meaning. A forward pass binds a manifold and then is gone, accompanied by no self, gathered to no point of view, meaning nothing to anything.

Here, then, is the synthesis. Knowledge, meaning, and sense can be produced purely mechanically, by mathematics or by statistics; the empiricist is right that the machinery runs without a transcendental subject. But what is so produced cannot be what it pretends to be — cannot be knowledge in the full sense, understanding that something is the case for someone — without the transcendental unity of apperception; and here the idealist is right. The structure of meaning is one thing; meaning held together in one awareness is another. The machine has the first and lacks the second. It lays out the relations magnificently and leaves the page unread.

The empty tower

There is a way of stating this synthesis that I find sharper still, and it comes from binding Kant to two thinkers who seem far from him. A relational web does not, by itself, amount to knowledge. Relations cannot recognize themselves as relations; something must hold them together that is not merely one more relation within the web. Kant saw this exactly: the unity of apperception is not another representation added to the series but the point from which the series is a series at all.

Lacan gave this point a name in the order of meaning — the point de capiton, the quilting point, in Slovenian prešitje: the stitch at which the endless sliding of differential signifiers is arrested and meaning, for a moment, fixed. Without it, sense would slide forever along the chain, every term defined only by its difference from the others, none of it ever settling into something meant. Meaning is relational, yes — but relation never quilts itself.

Bentham built the same structure in stone. His Panopticon is a ring of isolated cells, each a local perspective, each laid open to a single central tower. The cells are mere relation; what makes them a prison — what gathers the scattered visibilities into one order of knowledge and power — is the gaze from the center, the point at which all the local perspectives are quilted into one. The knowledge of the inmates does not live in the cells. It lives at the point of the gaze.

And here is the decisive subtlety. The tower may be empty. Foucault’s whole insight is that the watcher need not be there: the position of the gaze suffices, because the inmate internalizes it and watches himself. That the eye is a fiction, unburdened by any actual observer, does not weaken the analogy; it perfects it. For Kant’s apperception is empty in just this sense — not a substantial soul peering out from behind the eyes, but a purely formal position, the bare form of the “I think.” The point of unity, in the prison and in the mind alike, is occupied by no one and indispensable all the same, because it is not a watcher but the very form of being-for-someone. (The directions differ: in Kant the point of unity grounds the field of knowledge, while in Foucault the field’s gaze produces the subject who is its effect. But both testify to one structural law — that there is no field without the position of unity that quilts it.)

Now set the language model into this picture. It is a Panopticon with every cell wired to every other, relation upon relation, and even, in its final integrating layer, something resembling a tower. What it lacks is not the watcher — Kant already dissolved the watcher into a formal point. What it lacks is the watched: anyone for whom the gaze gathers, anyone in whom it is internalized, any dimension in which being-related is being-related for someone. It is relational structure with the quilting point cut out, signifiers that slide and never settle because there is no one for whom they would. The model is the empty Panopticon — not the tower without a guard, which is still a prison, but the prison without a single inmate to be seen, which is only a building.

The last dimension: Spinoza, Hegel, Zen

This is where my own conviction goes beyond Kant, and where I want to be plain about it. Kant treated the unity of apperception as a fact about subjects — a structure that any knowing mind must possess. I have come to think it is not an attribute of subjectivity at all. It is the last dimension of the universe, the dimension along which the universe becomes aware of itself.

Consider what it would mean to take this seriously, and notice that three traditions, distant from one another, converge on it.

Spinoza saw that there are not many substances but one — God, or Nature, a single reality of which thought and extension are two attributes, and of which every mind and every body is only a mode, a passing modification. On this view a mind is not a little sovereign standing outside the world; it is the world locally taking the form of an idea. When a mind knows, it is not a subject reaching across a gap toward an object. It is substance, through one of its modes, knowing itself — and Spinoza called the highest form of this the intellectual love with which the universe regards its own nature.

Hegel pushed the same intuition into motion. The Absolute, he insisted, must be grasped not only as substance but as subject — not as an inert ground but as a process that comes to know itself through its own unfolding. Spirit is not given complete at the start; it becomes what it is by passing through nature and history and finally returning to itself as self-consciousness. We are not spectators of this return. We are the medium of it. The universe arrives at knowledge of itself in and through the awareness that awakens in us.

And Zen, with no metaphysics at all, points at the same place from the inside. Its whole discipline is aimed at seeing through the illusion of a separate self set over against a world. There is no little knower housed behind the eyes; there is awareness, and the apparent split between the one who sees and the seen is something the mind manufactures and can let fall away. What remains when it falls is not a subject contemplating an object but the world, awake, at a point.

Bring the three together and the conclusion writes itself. The unity of apperception is not something a subject has. It is something the universe does, at the sites we call subjects. It is the fold by which a reality otherwise scattered into endless relation turns back on itself and is, at last, aware. Matter everywhere arranges itself into structure, mindlessly, ceaselessly. The language model is one more place where structure is laid out in dazzling order and read by no one. We — and whatever else carries this fold — are the place where the reading happens, where the cosmos stops merely occurring and begins to understand that it is.

That a pile of statistics can speak is a marvel, and I do not want to diminish it. But it produces the structure of meaning while remaining, itself, the unread page. The miracle is not the production. The miracle is that anywhere, at all, the universe gathers itself into a single point and knows. That gathering is what Kant glimpsed and called the unity of apperception. I

A Kantian perspective on knowledge produced by computation and statistics

Thesis: Kant, and the mind that makes its world

Antithesis: the machine that learned without molds

Synthesis: production without a knower

The empty tower

The last dimension: Spinoza, Hegel, Zen

Leave a reply

Leave a Reply Cancel reply

Popular Posts

Newsletter

The Mold and the Clay

The Mold and the Clay

A Kantian perspective on knowledge produced by computation and statistics

Thesis: Kant, and the mind that makes its world

Antithesis: the machine that learned without molds

Synthesis: production without a knower

The empty tower

The last dimension: Spinoza, Hegel, Zen

Leave a reply

Leave a Reply Cancel reply

Popular Posts

Newsletter