|
Conquering the Swedish Rating top: The Mecca of Chess Programs
by Claudio Bollini
For a long time the Swedish rating of the SSDF has supposedly provided a
reliable approach to estimate the Elo of a chess program. We accepted their
results in a non-critical and naïve way, while we were witness of their reaching
of FIDE Master's level first, then IM level, and finally how they fully entered into
GMs' league. Numeric extrapolations were carried out, and ascendant curves
were traced, trying to extrapolate the time when those silicon players would
surpass the barrier of the 2700-2800 Elo points.
But for some years now, a serial of dark circumstances began to throw some
shadows over the real consistency of the Swedish rating. First, it occurred the
so called matter of the "hidden" or "cocked" books: It was proven that some
programmers included in their creatures carefully prepared opening lines, to
exploit some lagoons in the book of a given rival. Thus, once and again they
crushed programs that were not prepared to react, and they artificially climbed
positions. Why "artificially"? Because the increment was meaningless as a
trustable reference for chessplayers.
Let us suppose that the program "A" is stronger against humans than the program
"B" (in other words: it would obtain better Elo in living tournaments); but the
latter has some specially prepared hidden lines against the program "A". Notice
that they are not necessarily better variations that would improve the opening
book quality, but rather only rebuttals for very particular weakness inside A's
book. In this way, "B" would be able to improve its Elo at expense of "A". We
haven't got here the usual practice in professional chess, i.e. the previous
preparation of openings lines against a flesh and blood competitor, because a
Master would never recurrently commit the same mistake in a variation in which
he failed badly before.
A second incident was unchained with an accusation that fell on the popular
Fritz program. ChessBase people would have provide their own interface for
Fritz 5 automated games, while the rest of the commercial programs remained
connected with the standard auto-player 232. If the responsible of a given
program is the same who supplies its interface, then it's unavoidable some
mistrust regarding on secret devices, which resembles the hidden books issue.
The whole stuff was about to blow out, but the Swedish association committed to
normalize all the interfaces, and to accept only the program from of the
distribution companies.
Surprisingly, the current crisis doesn't have to do with disloyal attitudes,
but with a decision impossible to deal with. According to Ed's announcement in
this web page, several programmers are facing the task of suppressing important
amounts of programming code in the evaluation function, to achieve more speed
and conquer their computer competitors by the brute force of tactical
calculation.
Mark Uniacke and Ed Shröder himself have included respectively the option
"anti-human" (Hiarcs 7) and "anti-GM" (Rebel 10). They advise to disable these
algorithms to face other programs, turned them into a quicker mode at expense
of a smaller strategic knowledge. On the contrary of Mark, Ed has set (with
better sense, we believe) his algorithm "against humans" as default. We find
the reason obvious: the last addressee of a chess program is the user himself!
The seriousness of the situation becomes evident, and puts again into question
the credibility of the SSDF rating. Reaching the pole position has become a
goal by itself, and everything have to be sacrificed to this supreme objective
-even the quality of a program. The aims are suspected of commercial nature. To
obtain the pinnacle of the Swedish list is still a good commercial resource to
noticeably increase the sales.
There is nothing necessarily bad on marketing strategies. But, what kind of
product would they be offering to final users? A program specially optimized to
successfully compete with other programs, which concrete benefits would report
to a serious player? We agree that if it is gauged to defeat their computerized
collages, it probably would be an almost invincible rival at blitz chess. But
if this tendency continues, privileging speed over knowledge, it would become
apparent its incompetence to compose medium and long-range plans, no matter
which time control is choosen.
In both RGCC and CCC discussion groups, numerous examples have been shown to
illustrate an unshakable ignorance of best chess programs to manage certain
positions. Not even a deep and speedy searching could nowadays find a solution
to a variety of cases that appear as obvious for a beginner. A sample taken
among thousands:
White: King d4, Bishop c4, Bishop e4; Black: King d6 (both white bishops on
white squares). This is an extraordinarily unusual, although not impossible
position. At a glance it shows as an evident draw, but there is not program
able to grasp this plain truth, unless the programmer had inserted some very
specific code lines to foresee this particular case. When I state that no
program can "grasp" this, I refer to the capacity of inducing from general
principles, a sound judgement for a singular position like this one.
Incidentally, a 100-plies depth search would be necessary to realize by sheer
brute-force method that the 50-moves-rule is unavoidable, and to admit a zero
evaluation score.
Programs with an ever-decreasing baggage of chess knowledge would inexorably be
defeated by a GrandMaster at tournament time controls, after a few exploratory
games -although they reigned unbeaten at the top of the SSDF.
If this alarming policy goes on, which would be our diagnosis about the future
value of computer-computer matches? As we said, we estimate dubious that the
SSDF can survive as a confident reference in human terms for further
measurement for programs progresses. In our opinion, the only way -simple,
direct, irrefutable- for obtaining a good picture of the real strength of a
program depends on the decision of sending it as a contestant in human
tournaments or matches. Before doing this, the companies should assume a little
risk with their immediate gains, betting to a change of mentality in the
computer chess world. We believe that this modification consists on
reconsidering what it is the essential feature to be demanded to the pedigree
of a program: A brilliant record defeating their commercial competitors, or a
good campaign, perhaps earning GM norms, in tournaments of international
category? In other words: What's is more valuable: an impressive SSDF Elo or a
respectable real Elo?
We hope both programmers and chess players will gradually assume this
perspective. Otherwise, how much more knowledge has to be sacrificed on the
altar of the marketing, retracing the paths opened by almost a half-century of
laborious researches? Where will be left that formidable challenge taken by
entire generations of chess programmers, who tried hard to translate into the
computer language the highly complex chess principles? Have they now given up
(putting as some honorable exceptions), renouncing to keep the fascinating and
formidable task of simulating each time more accurately the games of a
GrandMaster? In our Rebel 9 review we had used the analogy of the Turing Test:
We would wish that a moment arrived in which an elite chessplayer, confronting
a computer under tournament conditions would be unable to realize if he is
playing with another Master or not. Even more, we expect for a near future that
FIDE allows such kind of programs to compete under fair conditions, to get an
official Elo, and (why not?) to aspire to Chess World Champion title.
|