Sep 9, 2025

39 min read

History of chess computers: the Minimax

An in-depth look at the history of chess computers—with the first of the techniques developed.

Everyone knows about Deep Blue; it is the first chess computer that beat a reigning world champion—Garry Kasparov. Although losing to Kasparov in 1996, Deep Blue came on top in the rematch in 1997. The matches, however, are not the whole story. Creating this computer engine took decades of work from researchers and the chess world. Feng-hsiung Hsu—the man who started the project—said that the team “had spent close to 30 man-years on the project when Deep Blue won the match.”¹ Further from that, some ideas used in Deep Blue were theorised nearly half a century earlier. We will look at those ideas and how the current computer engines came to be—the ones that now shape how chess is played.

One thing to get straight is that these chess computers are not artificial intelligence or large language models . These machines don’t think critically or intuitively, but they only find the best move. Even though their creation was influenced by how humans think of chess positions, this doesn’t mean that they actually think like human chess players. While thinking about a position, a human might consider a line leading to a doubled pawn or an isolated pawn as a bad position; this is also how a computer calculates. It doesn’t prefer open kings in the middle game, but favours the king being close to the centre in the end game; however, they don’t “intuit” a position. They just brute-force moves and see which move order is better for them, while using some optimisation techniques.

Why?

For centuries, philosophers and scientists have speculated about whether or not the human brain is, essentially, a machine. Conversely, could a machine be designed that would be capable of “thinking?” These fascinations were the thing that started the search for creating chess computers… 75 years ago. After the second world war, several large-scale electronic computing machines were constructed, which were considered to be “capable of something very close to the reasoning process.”² Consider the Enigma Machine, these computers were primarly designed to carry out purely numerical calculations and to automatically perform long sequences of operations. The design of these computers were flexible, yet they still weren’t able to work with words, propositions, or other conceptually challenging constructs. We needed to find a way to speak with these machines, to find a common language let’s say.

Even creating a machine that would translate from one human language to another human language—not necessarily being semantically equivalent, just a simple word-by-word translation—was researched heavily. Or even machines that could be useful in proving mathematical theorems was hypothesised . In a way, computers were starting to be immensely worked on to be included in our everyday lives.

Why chess?

Chess is such a game that it pits two intellects against each other in a situation so complex that neither can hope to understand it completely, but sufficiently amenable to analysis that each can hope to out-think her opponent. The game is sufficiently deep and subtle in its implications to have supported the rise of professional players, and to have allowed a deepening analysis through 300 years of intensive study and play without becoming exhausted or barren. Such characteristics mark chess as a natural arena for attempts at mechanisation, which would illustrate some of the possibilities that could come from the direction that the world has been going towards. If one could devise a successful chess machine, one would seem to have penetrated to the core of human intellectual endeavors.

This problem, of course, has no importance in itself, but it is undertaken with a serious purpose in mind.²³ The investigation of the chess-playing problem was intended to develop techniques that could be used for more practical applications.

In more formal terms, the chess machine was chosen to be the ideal problem due these reasons:

The problem is sharply defined, both in allowed operations and in the ultimate goal .
It is neither so simple as to be trivial nor too difficult for a satisfactory solution.
It can easily be pitted against a human opponent, giving a clear measure of the machine’s ability in this type of reasoning.

The theory

We start by defining chess as a game. Specifically, two important names, John von Neumann and Oskar Morgenstern classified chess as a two-player zero-sum game with perfect information:⁴

Two-person: Two parties involved, each trying to maximize her outcome.
Zero-sum: The gain of one player is the loss of the other player.
Game: Has simple and well defined rules that is scalable to larger complexity that the outcome can be determined in finite time.⁵
Perfect information: When each player is making any decision, they are perfectly informed of all the events that have previously occurred.⁶ Games where some of the information is hidden from the players are examples of games with imperfect information, such as poker or bridge.

These facts imply that any given chess position must be either [1] a won position for white , [2] a drawn position , or [3] a won position for black . This result is trivial when the existence theorem is considered. Of course, there is no practical method in exactly determining which of the three cases a chess position belongs to . However, we will see how one can come close to an answer .

Formalising the position

Another important name is of Claude E. Shannon’s . He theorised how a programme for playing chess might be constructed in his paper “Programming a Computer for Playing Chess”,⁷ which has been really influential in research into creating chess computers. Hence, we will also use some of the ideas presented in his paper throughout this post.

Let’s start with defining a chess position; we need to keep track of the following information:

Positions of all of the pieces on the board,
A boolean to track which side’s turn it is,
A statement as to whether the kings and/or the rooks have moved ,
A statement of the last move , and
The number of moves made since the last pawn move or capture .

We will not consider the three-fold repetition rule for simplicity. Also, if a chess computer is repeating a position, it means that it cannot find a better move that allows it to win, so this will result in an infinite loop—where each side is repeating the same moves indefinitely, which is simply a draw.

General considerations

In some games it is easy to see who is winning, like in Nim. This is not so easy of a task in chess. We will define the function $\text{eval}(\text{P})$ to determine to which category a position $\text{P}$ belongs to. If such an evaluation function were to be found for a game, it would be easy to design a machine capable of perfect play. The machine would never lose or draw a winning position and never lose a drawn position, and if the opponent ever made a mistake, the machine would capitalise on it instantly. Let’s suppose that

$\begin{array}{l}\text{eval}(\mathrm{P})=1\text{ for a won position,}\\\text{eval}(\mathrm{P})=0\text{ for a drawn position,}\\\text{eval}(\mathrm{P})=-1\text{ for a lost position.}\end{array}$

When it’s machine’s turn to move, it’ll calculate $\text{eval}(\text{P})$ for the various positions obtained from the present position by each possible move that can be made. Then, it will choose the move that gives the maximum value to the evaluation function.

This solution is also possible with chess in principle. We can technically create a machine to play a perfect game. One can consider, in a given position, all possible moves, then all moves for the opponent from those moves, and so on, to the end of the game . The end must occur as by the rules of the game .⁸ By working backwards, we can determine if a position is won, lost, or drawn.

It’s easy to see that, even with all of the computational powers we have today, this calculation is frankly impossible. In a typical chess position, there seems to be an average of 31 moves.⁹ Thus, a move for white and a move for black gives about $31^2 \approx 10^3$ possibilities.

From now on, we will also call a move from either side a ply, so a full move is two ply. What’s more, 8 plies give $10^{12}$ possibilities, a trillion different possibilities. Even if our current computers can process a billion positions in a second , it’ll still take over 16 minutes to go four moves deep, and it still won’t be able to find a mate in 5—something chess masters can find in mere seconds. We note that the total number of variations needed to be calculated from the initial position would be of the order $10^{120}$ , given as an approximation by Shannon.⁷

Another method would be to have a “dictionary” of all possible positions of the chess pieces. For each possible position, there would be an entry of the correct move, and the machine would just look up this dictionary and make this indicated move. There seems to be of the order $8.7 \times 10^{47}$ legal chess positions.¹⁰ We note that this number is orders of magnitude smaller than what we obtained above. This is because a position might be reached from the starting state in multiple ways, which is called a transposition. We will get back to this observation later.

Okay, so it’s clear that the problem we are trying to solve, for now, is not that of designing a machine to play perfect chess chess was constructed) nor one which merely plays legal chess . We would be perfectly fine if it were to play a skillful game, maybe like a good chess player.

Some pseudo-evaluation

Even though we don’t have an exact evaluation function $\text{eval}(\text{P})$ , it is still possible to perform an approximate evaluation of a position. Of course, a good chess player must be able to perform such an evaluation. What she might do is to look at the general structure of the position, the number and kind of black and white pieces, pawn formation, piece mobility, etc. The stronger the player, the better are her evaluations. A lot of principles and rules of correct play are really assertions about evaluating positions; for instance:

The relative values of queen, rook, bishop, knight and pawns are about 9, 5, 3, 3, 1, respectively. Thus, if we add the number of pieces for the two sides with these coefficients, the side with the largest total might have the better position.
Rooks should be placed in open files. This assertion is under a more general principle that the side that has more “mobility” might have the better position. Here, mobility may mean the number of legal moves, since the more moves you can play, the more your pieces are “mobile.”
Backward, isolated, and doubled pawns can be a weakness.
An exposed king can be considered as a weakness, until the end game. You want to be as close to the centre as you can in the end game—again, relating to mobility.

We note that these principles are not conclusive or final; a chess player might have more rules that they follow. We also say that these rules can be contradicted by particular counter examples, where for example, a queen sacrifice leading to a checkmate would be against our first rule. Nonetheless, we can construct a crude evaluation function from these rules:

\begin{align*} \text{eval}(\mathrm{P}) &= 200(\mathrm{K}-\mathrm{K}') + 9(\mathrm{Q}-\mathrm{Q}') \\ & + 5(\mathrm{R}-\mathrm{R}') + 3(\mathrm{B}-\mathrm{B}' + \mathrm{N}-\mathrm{N}') \\ & + (\mathrm{P}-\mathrm{P}') - 0.5(\mathrm{D}-\mathrm{D}' + \mathrm{S}-\mathrm{S}') \\ & - 0.5(\mathrm{I}-\mathrm{I}') + 0.1(\mathrm{M}-\mathrm{M}') + \cdots \end{align*}

Here, K, Q, R, B, N, P are the number of pieces of the white side. D, S, I are doubled, backward, and isolated white pawns. M is white’s mobility. Primed letters are the same quantities for black, and checkmate is artificially included by giving the king a large value of 200.

We see that these criteria for evaluation is extremely crude. The value of the material is certainly the overruling consideration in a chess player’s decisions, except for “sacrifices.” But mobility and pawn structure, it seems, is only a handful of a number of general considerations which eventually result in the player’s choice of a specific move.

We say that a value of a knight is greater as the piece goes into the opponent’s position; similarly, the value of a pawn is greater as it comes close to promotion. But, we also have overruling considerations too. For instance, we want mobility to be high, but not for the king; the king should not be in the open, for it can be checkmated. These other considerations are hard to define even in a chess player’s language; to formulate them in machine language is considerably more difficult. You’d need to add more and more terms into the function , which would result in an increase in the amount of time that is required to calculate that function, but we deem this enough as a good starting point.

We observe that this approximate evaluation function has more freedom than the exact evaluation function, which only has three possible values. This is because in practical play, a position may be an easy win if a player is up a queen, or a very difficult win with only a pawn advantage. There are many cases to win in the former, while there is only a single exact way to win in the latter, where a single mistake would often destroy the entire advantage. On the other hand, the unlimited intellects assumed in the theory of chess would never make a mistake and the smallest winning advantage is as good as mate in one. A game between two such mental giants Alice and Bob would proceed as follows.

They sit down at a chessboard, draw for colours, and survey the pieces for a moment.

Then, either Alice says “I resign”, or
Bob says “I resign”, or
Alice says “I offer a draw,” and Bob replies “I accept.”

Some strategies on evaluating positions

Continuing on, a very important point about the evaluation function given above is that it can only be applied in relatively quiescent positions. Consider, for instance, in an exchange of queens, white plays $\text{Q}\times\text{Q}$ , and black will reply with $\text{P}\times \text{Q}$ . It would be absurd to calculate $\text{eval}(\text{P})$ after $\text{Q}\times \text{Q}$ , while white is, for a moment, a queen ahead, since black will immediately recapture. More generally, it is meaningless to calculate an evaluation during the course of a series of exchanges.

Obviously, more terms could be added to the evaluation function to account for exchanges in progress, but it appears that combinations, and forced variations in general, are better accounted for by the examination of specific variations. Notice that this is the exact way good chess players calculate. A certain number of variations are investigated move by move until a more or less quiscent position is reached, or until the position turns into a trivial case, and at this point, something of the nature of an evaluation is applied to the resulting position. White has the advantage when the evaluation is positive, and conversely, black has the advantage when the evaluation is negative. Therefore, white chooses the variation leading to the highest evaluation for her when black is assumed to be playing to reduce this evaluation.

It also doesn’t really matter to have this distinction between white and black evaluations. Each player tries to maximise their own evaluation, while minimising their opponent’s evaluation. In fact, while programming the chess computer, we will do the exact thing—we will do something of the nature of reversing the sign of the evaluation according to who’s move it is. Note that: one player’s maximum is the other player’s minimum .⁴

Minimax

In fact, we can describe this exact observation mathematically: one player maximising her evaluation while the opponent is minimising that evaluation. We will call this algorithm minimax.

At first, we omit the fact that $\text{eval}(\text{P})$ should only be applied in quiescent positions. A basic strategy of play based on $\text{eval}(\text{P})$ and operating one move deep is the following. Let $\text{M}_{1}, \text{M}_{2}, \cdots, \text{M}_{n}$ be the moves that can be made in position $\text{P}$ and let $\text{M}_{1}\text{P}, \text{M}_{2}\text{P}, \cdots$ denote symbolically the resulting positions when those moves are applied to $\text{P}$ . Then, one chooses the $\text{M}_{i}$ which maximises $\text{eval}(\text{M}_{i}\text{P})$ .

A deeper strategy would consider the opponent’s replies too. If white chooses move $\text{M}_{i}$ , then let $\text{M}_{i, 1}, \text{M}_{i, 2}, \cdots, \text{M}_{i, m}$ be the possible answers by black. Black should play to minimise $\text{eval}(\text{P})$ . Remember the exact definition of the evaluation function, black wins only when $\text{eval}(\text{P}) = -1$ . Furthermore, her choice occurs after white’s move; thus, if white plays $\text{M}_{i}$ , black may be assumed to play the move $\text{M}_{i, j}$ such that

$\text{eval}(\text{M}_{i, j}\text{M}_{i}\text{P})$

is a minimum. So, we’re adding on top of the position $\text{M}_{i}\text{P}$ that comes after white plays her move. Therefore, white should play her first move such that $\text{eval}$ is a maximum after black chooses her best reply. Mathematically, white should play to maximise on $\text{M}_{i}$ the quantity

$\min_{\text{M}_{i, j}}\text{eval}(\text{M}_{i, j}\text{M}_{i}\text{P}).$

The main idea is shown in the following figure. The point on the left is the current position. It is assumed that there are three possible moves for white, indicated by the three solid lines, and if any of these is made there are three possible moves for black, indicated by the dashed lines. The possible positions after a white and black move are then the nine points on the right, and the numbers are the evaluations for these positions. Minimising on the upper three gives $+0.5$ , which is the resulting value if white chooses the upper variation and black replies with her best move. Similarly, the second and third moves lead to values of $-5.0$ and $-4.0$ . Maximising on white’s move, we obtain $+0.5$ with the upper move as white’s best choice.

This final quantity only “searches” two plies deep—just a single move. However, we can increase the depth by adding more moves, so a similar two-move strategy would be

\begin{align*} & \max_{\text{M}_{i}} \min_{\text{M}_{i, j}} \\ & \max_{\text{M}_{i, j, k}} \min_{\text{M}_{i, j, k, l}} \\ & \text{eval}( \text{M}_{i, j, k, l} \text{M}_{i, j, k} \text{M}_{i, j} \text{M}_{i} \text{P} ). \end{align*}

The order of maximising and minimising this function is important. It is derived from the fact that the choices of moves occur in a definite order. A computer playing white, operating on this strategy at the two-move level would first calculate all variations out to four plies and the resulting positions $\text{P}$ . The evaluations are calculated for each of these positions. In one of these variations, we fix all but black’s last move; it is varied and the move chosen is the one which minimises $\text{eval}$ . This is black’s assumed last move in that variation. We continue doing this, iterating on white’s and black’s moves, until we reach the current position and the best first white’s move . This move is then played. This idea can also be generalised to any number of moves—though it would require a lot more computational power.

Thoughts on minimax

In his 1949 paper, Shannon called the minimax algorithm strategy of type A and described how a computer could be constructed that would implement this strategy. It’s not hard to notice that a machine operating on strategy type A would be painfully slow, and it won’t even be a novice at the game. The machine would operate in an extremely inefficient fashion—it would compute all variations to exactly four moves deep and then stop .

We also see that, for a given evaluation function and a chess computer, the greater the depth of analysis, the better the chess that would be played. Taking the limit, of course, such a process would play perfect chess by finding terminal positions for all continuations . Thus, we can define a metric to compare different programmes: by measuring the single dimension of their depth of analysis.

A good human chess player examines only a few selected interesting looking variations and carries them out to a reasonable stopping point, until a quiet position is reached. A world champion can construct 15-20 moves of a variation. In amateur play, however, variations are seldom calculated to such lengths—mostly six or eight moves are examined of variations that are of forcing nature . More generally, when there are few threats and forceful moves, most calculations are not deeper than one or two moves, with perhaps some forcing variations explored to three, four, or five moves.

Some heuristics

Even then, deciding when to or when not to search a variation or a line is an interesting idea. The amount of selection exercised by chess masters in examining possible variations had been studied quite deeply. Specifically, De Groot worked on this problem experimentally in his thesis by showing various positions to chess masters and asking them to decide on the best move, describing their analysis out loud.¹¹

In this instance, the chess master examined sixteen variations, ranging in depth from one ply to nine ply. From this, we see that for a machine to play some good chess, it must:

Examine forceful variations out as far as possible and evaluate only at reasonable positions, where some quasi-stability has been established .
Select the variations to be explored by some process so that the machine does not waste its time in totally pointless variations.

Shannon described that a machine implementing these two improvements would be of type B strategy. To incorporate the first feature, we define a function $g(\text{P})$ of a position which determines whether approximate stability exists—no piece en prise, for example. Using this function, variations could be explored until $g(\text{P}) = 0$ , but always going at least three moves and never more than say, 10.

The second improvement would require a function $h(\text{P}, \text{M})$ to decide whether a move $\text{M}$ in a position $\text{P}$ is worth exploring. It is vital that this initial screening should not fully eliminate moves which merely look bad at first sight; for example, a move which puts a piece undefended are frequently quite strong in the whole of the game.

We can create the function in a heuristic way by examining how masters of the game think about the “worth” of different lines. A check is the most forceful type of move. The opponent’s replies are highly restricted—she can never answer by a counter attack. This means that the counter moves of a check can be more readily calculated than any other. Similarly, captures, attacks on major pieces, mate threats, moves that greatly reduce the opponent’s movement, etc. limit the opponent’s replies and should be calculated to some depth. Hence, $h(\text{P}, \text{M})$ should give high values for forceful moves, medium values for developing moves, and low for the rest. This allows us to give concrete values to $h(\text{P}, \text{M})$ and use it when deciding on which lines to calculate to some depth. In a sense, we would be ordering the moves of a position from how interesting that move is.

Another idea that could be incorporated into the $h(\text{P}, \text{M})$ function is to consider specific aspects of the position. For example, if a rook on the back rank is tied down defending against checkmate, it cannot also defend the pawn in front of it. Similarly, if a pawn or another piece is protecting a square that could be used for a knight fork against higher-value pieces, that aspect of the position is important to respect. Chess players often think in this way, identifying and remembering these aspects of the position. If my rook on the back rank is still unable to move, I can disregard that move as viable and avoid searching that variation. Another way to think about them is if my opponent cannot move a pinned piece, I should put pressure on that piece. Interestingly, even chess masters can sometimes overlook these positional details, especially as they change throughout the game. A move that was previously unviable may become possible a few moves later, for it’s quite easy to miss these changes and remain blind to that part of the position.

Turing, the one and only

In implementing similar ideas, Alan Turing also gave two important definitions in his paper “Digital Computers applied to Games” .¹²

“Considerable” moves : Every possibility of white’s next move and black’s reply is considerable. If a capture is considerable, then any recapture is considerable. The capture of an undefended piece or a capture of a major piece by a minor piece is considerable. Move giving check is considerable. This is similar to the $h$ function given by Shannon, but instead of a range, a set of—recursively defined—moves are considered.
Dead position: A position where there are no considerable moves. In other terms, a position where no gain can be made by captures, so the only advantage would come from positional play. This is of similar nature to the quasi-stability and quiescence defined by Shannon, though considerably more strict.

The first chess “computer”

Before, the term computer meant “a person that does calculations, computation.” This is quite untrue today, but the sentiment back then was that if we were to create an algorithm , we could do the calculations , until some special machine can be created. This is what Turing went on to accomplish in that paper we mentioned.¹² He found a very simplified algorithm , which can be played by a human. In the end, the “computer” lost in 29 moves against a weak player that didn’t know how the system worked.

I believe that this is the first computer that has played chess, even if it were a human following instructions Though, Newell et al. mentiones the errors in Turing’s calculations and play of his programme, because he, the human similator, was unwilling to consider all of the alternatives.³ Even so, the main objection to hand simulation is the amount of effort required to do it. A machine is really the enabling condition for exploring the intricacies of a complex programme.

Overall, the system designed made considerable blunders and had critical weaknesses . Turing describes it “as a caricature of his own play.” His algorithm was a reflection of his thought process. This general view of “one cannot programme a machine to play a better game than one plays oneself” continued up into the 80s. As Turing predicted, this statement was falsified quite convincengly, as we will see.

It might also be possible to programme the machine to search for new types of combination in chess. If this product produced results which were quite new, and also interesting to the programmer, who should have the credit? — Alan Turing¹²

Variation in play

All in all, this type B strategy would use some heuristics to calculate not all but some variations to some depth, maybe close to what a good chess player can calculate. It doesn’t brute force all and every position, unlike the type A strategy . Notice that, machines implementing these strategies would be deterministic—they would make the same move for the same position, so if the opponent made some moves and won the game, she could make the same moves again and win continuously. Thus, we need to introduce some variation in the play of the machine.

This variation can come from multitude of things:

Whenever there are more than one move that ranks similar according to the machine’s calculations, the machine could choose one of them randomly.
Storing multiple openings to be used randomly at the start of the game—similar to how a human would memorise some opening ideas, of course the machine would have much greater memorisation abilities.
Changing the evaluation function of the machine, either from the start or when the game reaches certain break points .

The third point is particularly interesting because it allows us, to some extent, to change the “style” of the play of the machine. For example, the machine could become a positional player if we assign higher values to positional weaknesses. Or, by prioritising search opportunities for forced positions, it would adopt a more aggressive playing style .

It’s also insteresting to consider these machines as having a “style” of play, which is not how we typically think of them today. In the past, the goal was to create a machine that would be comparable to human play, not one that would consistently outperform humans by a wide margin. Now, however, Stockfish simply finds the best possible move, period. While it does provide additional lines that are comparable to the best move , Stockfish itself always selects the optimal moves—without any real variation.

The first chess computer, actually this time

Finally, we can now start looking at the actual computers and machines that have been designed over the years.

With the theoretical baseline given by Shannon and Turing, a group at Los Alamos Scientific Laboratory programmed MANIAC I to play chess in 1956.¹³ This programme is an almost perfect example of type A strategy. Specifically, the programme explores all alternatives with all continuations being explored to a depth of two; the static evaluation function $\text{eval}$ consists of a sum of material and mobility measures, with $\text{eval}$ being integrated into the minimax process.

For the programme to be viable, a major concession was required. Insted of the normal board of eight squares by eight squares, the researchers used a reduced board: six squares by six squares, where they eliminated the bishops and all of the special chess moves . This was done to reduce the number of positions that the machine needed to search. On a $6 \times 6$ board, there seems to be 20 possible moves for a position. Meaning that the machine needed to search through $20^4 = 160000$ positions instead of $31^4 = 923521$ positions to go to a depth of four moves.

The computer played three games: one against itself, one against Martin D. Kruskal , and one against a weak player that had just learned the game. With queen odds, the machine had played really well against Kruskal, and he made no gain in 15 moves, and “even started calling his opponent he instead of it.” As the game went on, it appeared that Kruskal might even lose! Although, a draw was seemed the most probable result.

However, at move 19, the machine chose a weak continuation which enabled Kruskal to lay a three move mating trap. The machine’s only way to delay checkmate was to sacrifice its queen, which it did. After the forced exchange of its queen for a pawn, an endgame resulted in which the machine had no chance; the finish came at move 38 by Kruskal. Funnily enough, after Kruskal’s checkmate, the machine responded with an illegal move. “We were dumbfounded for a while, until we traced the trouble and realized that the program had never been taught to resign.”¹⁴ Facing no moves, the machine was stuck in a loop, and the loop changed the program.

The third and final game played by the machine matched it against a member of the laboratory staff, who had learned the game a week before; she was coached during the week and had played several practice games. The result? Again the program is a weak player, but now one that is capable of beating a weak human player. Some of the authors of the paper felt that “the play of the machine with the above rules for the generation of moves could be compared to that of one who has average aptitude for the game and experience amounting to 20 or so full games played.”¹³

Even though the machine had won that game, the researchers’ belief was that a machine will not be made to beat a strong player, in the near future. Even though it will take some 30 years for a machine to do that, I believe this first win for the machine against someone with some chess knowledge shows that the it is possible, that we have at least entered the arena of human play—we can beat a beginner.

Also, an interesting remark made by the paper is that playing chess on a machine serves to illuminate the mechanism by which the human brain operates: gathering a great amount of information at a glance, and then evaluating it by as yet some incomprehensible criteria. It is clear, for instance, that the human brain operates for varied purposes in many parallel channels, in contrast to the machine, which, although very fast, does one simple thing at a time before going on to the next. A chess player’s feeling for position and proper evaluation of it is a prime example of this remarkable faculty, and perhaps new insights will be gained into the organisation of thought. A fascinating read for what a chess master “sees” and about the organisation of the mind is from Chase and Simon.¹⁵

Bernstein’s programme

Under IBM
Alex Bernstein, a chess player and programmer
For the IBM 704 computer
In 1958
Type B strategy

This programme is too in the Shannon tradition, but it takes an extremely important step in the direction of greater sophistication; only a fraction of the legal alternatives and continuations are considered, making the programme one of the first to implement Shannon’s type B strategy. The programme uses plausible move generators to propose moves to be considered, up to seven moves with a priority order, with the most important first. After the static evaluation, the programme minimaxes and goes four ply deep.

Evaluation

The static evaluation, the $\text{eval}$ function, is generated with three criteria:

Mobility: number of available moves for each side
Area control: number of squares completely controlled by each side
King defense: number of controlled squares around each king
Material: material count of each side

Interestingly, these criteria are parametrised for easy variation, so that different values may be given to center squares. The first three criteria is added together, while the fourth one is multipled by large factor. This prevents the machine from sacrificing material, and encourages it to exchange when it is ahead, as it considers the ratio of its own and opponent’s scores.

Decision routines

These functions serve to generate plausible moves which can be further analysed . The understanding behind this decision is to select a small number of moves which are potentially good , rather than to eliminate from all possible moves the much larger number which can be determiend to be bad.

The generators determine whether certain states exist; if the answer is yes, certain moves are generated. The questions asked are:

Is the king in check?
A) Can material be gained? B) Can material be lost? C) Can material be exchanged?
Is castling possible?
Can minor pieces be developed?
Can key squares be occupied?
Can open files be occupied?
Can any pawns be moved?
Can any piece be moved.

The order of these routines is important:

At the beginning of the game, questions 1, 2, and 3 are not applicable, and questions 4 and 7 are the only ones that produce moves.
In the middle game, questions 2, 5, and 6 are utilised.
In the end game, questions 5, 6, 7, and 8 are most employed.

Once there are 7 moves stored to be analysed, the generators are stopped, and it continues on to the tree, where it plays out four ply and minimaxes as usual. With a plauisble move set of seven, and a depth of four ply, this examination at a whole takes an average of 8 minutes.

Some thoughts about the programme

Overall, the programme’s play is uneven, with striking blind spots all over. The programme has never beaten a player, but a really interesting game was played against a good chess player.¹⁶¹⁷

Fiddling while Rome burns…

A fascinating game played by Bernstein's programme

Radical selectivity

Bernstein’s programme gives us our first information about selectivity. At 7 moves per position, it examines only $7^4 \approx 2500$ final positions, out of $31^4 \approx 900000$ legal continuations. It still playing tolerably with a reduction in search by a factor of 350 implies that the selection mechanism is fairly effective. Of course, the selection follows the common and tested lore of the chess world.

On the other hand, such radical selection should result in the programme overlooking certain moves and consequences, as the machine has none of the checks and balances of the human selectivity. Anyone good enough to construct a three-move trap can beat it. Knowing the decision routines, you can often think of moves that the machine won’t. The machine will unknowingly accept a sacrifice, without considering the consequences. Though, any particular difficulty is removable: by adding a new move generator responsible for another feature of the board. Somewhat fascinating, Newell et al. mentions that “this kind of error correction is precisely how the body of practical knowledge about chess programmes and chess play will accumulate, gradually teaching us the right kinds of selectivity.”³ We will see how that turns out in our series.

Every increase in sophistication of performance is paid for by an increase in the complexity of the programme

The move generators and the components of the static evaluation require varied and diverse information about each position, requiring more computing time per position than with the Los Alamos programme. In fact, we now compare them both:

Both programmes take about the same time to produce a move—8 minutes for Bernstein’s and 12 minutes for Los Alamos.
The increase in problem size of the $8 \times 8$ board over the $6 \times 6$ board is cancelled by the increase in speed of the IBM 704 over the MANIAC .
We can say that they would both produce a move in the same $8 \times 8$ game in the same time.

Hence, the increase in amount of processing per move in Bernstein’s programme approximately cancels the gain of 350 to 1 in selectivity that this more complex processing achieves. It is not wise to compare two programmes in their performance except in a very crude way. For our purpose, let’s just assume that, considering the games played from both programmes, they are roughly comparable in performance.

With a primitive approximation, then, we have two programmes that achieve the same quality of performance with the same total effort by two different routes: the Los Alamos programme by using no selectivity and being very fast, and the Bernstein programme by using a large amount of selectivity and taking much more effort per position examined in order to make that selection. Two programmes following the different strategies of Shannon: type A and type B, respectively.

The point we wish to make is that this equality is an accident: we see that selection is a very powerful device and speed is a very weak device for improving the performance of chess programmes.

Let’s say that these programmes were to explore to a depth of six ply, instead of four. Then, the Los Alamos programme would take $31^2 = 1000$ as long as now to make a move, whereas Bernstein’s programme would take about $7^2 = 50$ times as long, gaining an edge with a factor of 20 in the total computing effort required per move.

In fact, this demonstrates a significant feature of chess: the exponential growth of positions to be considered with depth of analysis. As analysis deepens, greater computing effort per position soon pays for itself, since it slows the growth of number of positions to be considered. As mentioned previously, as the depth of the analysis increases, the better the programme will be at playing chess.

One more calculation might be useful in emphasising the value of heuristics in eliminating branches to be explored. Suppose we have a branching tree in which our programme is exploring $n$ moves deep, and let this tree have four branches at each node. If we could double the speed of the programme—that is, consider twice as many positions for the same total effort—then, this improvement would let us look half a move deeper . If, on the other hand, we could double the selectivity—that is, only consider two of the four branches at each node—then, we could look twice as deep . It is clear that we could afford to pay an apparently high computing cost per position to achieve this selectivity.

To summarise, Bernstein’s programme introduces both sophistication and complication to the chess computer. Although, in some aspects it’s still lacking. Yet, with all of the downfalls of the machine, the IBM 704 plays a respecting game, where an unknowing bystander might not be able to distinguish the machine from human play .

Conclusion?

In this post, we’ve looked at the basic theory behind the first chess computers, the minimax algorithm, and have looked at how some machines worked under that algorithm with some heuristics. In our next post, we will look at another algorithm called the alpha-beta algorithm, first found by Newell and Simon,³ and some more optimisations. After which, we will culminate with Levy’s bet!

Footnotes

The week in chess — Feng-hsiung Hsu’s letter on Garry’s accusations. ↩
The paper “A Chess-Playing Machine” by Claude E. Shannon. ↩ ↩²
“Chess-Playing Programs and the Problem of Complexity” by Newell, Shaw, and Simon. Article ↩ ↩² ↩³ ↩⁴
These definitions were made in Von Neumann and Morgenstern’s book titled “Theory of Games and Economic Behaviour” , where they give general and formal definitions of a “game.” ↩ ↩²
On computing zero-sum games from Humboldt-Universität zu Berlin ↩
Infinite Games - University of Sofia—if you want to read more about the formalisation of infinite games with perfect information, which is first introduced with chess. ↩
In his paper “Programming a Computer for Playing Chess” , Shannon gives how a programme can be constructed to find the best move in a chess position. ↩ ↩² ↩³
A video on the longest possible chess game, which was found to be 5898 moves. ↩
Chess Stackexchange — The user “Kentdjb” on Chess StackExchange analysed millions of games and found 31.1 moves to be the average. ↩
Chess Position Ranking — This work found, among other things, the number of legal positions of chess to be of the order $10^{47}$ . ↩
Adriaan de Groot 1946 “Het denken van den schaker” . ↩
“Digital Computers applied to Games” by Alan Turing, 1953. Turing Archive ↩ ↩² ↩³
“Experiments in Chess” by J. Kister et al. Paper0877 ↩ ↩²
Some anectodes about the Los Alamos Programme. Blogpost ↩
“Perception in Chess” by William G. Chase and Herbert A. Simon. Paper ↩
The chess game played by Bernstein’s IBM 704 computer against a good player: Chess.com ↩
“Computer v. Chess-Player” by A. Bernstein and M. de V. Roberts ↩

Subscribe!

Follow Next!

History of chess computers: the Minimax

Why?

Why chess?

The theory

Formalising the position

General considerations

Some pseudo-evaluation

Some strategies on evaluating positions

Minimax

Thoughts on minimax

Some heuristics

Turing, the one and only

The first chess “computer”

Variation in play

The first chess computer, actually this time

Bernstein’s programme

Evaluation

Decision routines

Some thoughts about the programme

Fiddling while Rome burns…

Radical selectivity

Every increase in sophistication of performance is paid for by an increase in the complexity of the programme

Conclusion?

Footnotes

Footnotes

Contents