Evidence of Non-Random Clusters in
the
Career Performance of George Brett and Tony Gwynn
by Bill James, Guest Contributor
How “hot” is a hitter? How do you measure how “hot” a
hitter is, at the moment?
While examining that question, I may have found some fragmentary evidence that hitters -- well, two hitters -- actually do have non-random clusters in performance. Let me explain the process at which I arrived at that belief.
When a hitter is on a hot streak, how hot is he? Call it the Johnny Carson question: How hot was he? I’m not debating, for the moment, the question of whether hot streaks are real or illusory. I am looking for a way to measure the effect, regardless of whether it is a real effect or a transitory perception.
How hot a hitter is -- or how cold -- can be measured in this way. Suppose that
you start every hitter out every season with 40 at bats and 10 hits, a .250
average, on the theory that, at .250, no hitter may be said to be either hot or
cold.
After each game, the player’s “momentary at bats” and “momentary hits” are
re-calculated as:
The previous day’s figure
Times .900
Plus the data from the most recent game.
To illustrate the system, let us suppose that the hitter starts out the season “cold,” in a slump. On opening day he goes 0-for-4. His “batting temperature” drops to .225:
10 hits
Times .900
Equals 9.00
Plus zero
Equals 9.00 hits.
40 at bats
Times .900
Equals 36.00
Plus 4
Equals 40.00 at bats.
Nine divided by 40 equals .225, so his batting temperature is .225. By this process, 10% of his temperature is based on what he did in the last game, 52% is based on what he has done in the last seven games, and 90% is based on what he has done in the last 22 games. When a player gets cold, his batting temperature drops rapidly; when he gets hot, it shoots up rapidly.
Let us suppose that this player, after going 0-for-4 on opening day, then goes
1-for-5 the second day. His batting temperature would drop to .222 (9.1 divided
by 41.0). Let us suppose that he then goes 0-for-3, 1-for-3,
0-for-5, 0-for-4, and 0-for-3 -- a realistic type of slump. His batting
temperature, after those seven games, would drop to .156.
Suppose, however, that he then gets hot, going 2-for-4, 1-for-3, 3-for-4,
2-for-4, 2-for-2, and 3-for-4 over the next six games. His batting temperature
would then be .370 (13.7 divided by 37.0). It is a deliberately unstable
measure of hitting ability, designed to swing relatively freely based on the
player’s most recent performance.
I was looking for some way to measure how “hot” a hitter was, because... well, this is what I do for a living. To tinker with the system, I was using the career day-by-day performance log of George Brett, which I keep handy to console myself whenever the Royals fall into a deep slump [Brett's Hall of Fame webpage is available here, his career stats here, and a retrospective by the Kansas City Star, here]. The hottest George Brett ever was in his entire career, by the way, was on August 18, 1980. At that moment Brett:
a) had hit in 30 consecutive games,
b) had averaged almost two hits a game over that entire 30-game stretch (57 hits),
c) had had 9 multiple-hit games in 13 days, and
d) had finished that off by going 3-for-4, 4-for-4
(with 5 RBI) and 3-for-5, a .769 average over the last three
days.
His batting temperature at that moment was .525.
The coldest that George Brett ever was was on September 8, 1982. After carrying
a .306 batting temperature on August 22 of that season, Brett fell into a
horrific two-and-a-half week slump during which he went 5-for-61, ending with
four consecutive hitless games. His batting temperature fell to .125.
I figured George Brett’s batting temperature, and also his slugging temperature,
figured in the same way except with total bases replacing hits, after every game
of Brett’s career. (We start the player out each season with 14 total bases, a
baseline slugging percentage of .350.) It then occurred to me to pose a simple
question: Does Brett in fact hit better when he starts the game “hot” than when
he begins the game “cold”?
I had expected, based on previous studies, that Brett would hit no better when
he was “hot” than when he was “cold.” In fact, though, he did. The chart below
shows, on the top (red) line, what Brett hit in his career in the 100 games when
he entered the game most “hot,” and the bottom (blue) line reflects what he hit
when he entered the game most “cold.”
|
G |
AB |
R |
H |
2B |
3B |
HR |
RBI |
BB |
SO |
SB |
CS |
GDP |
Avg |
SPct |
|
100 |
385 |
70 |
143 |
28 |
4 |
18 |
81 |
58 |
20 |
11 |
6 |
6 |
.371 |
.605 |
|
100 |
382 |
59 |
115 |
29 |
1 |
9 |
47 |
28 |
42 |
9 |
4 |
11 |
.301 |
.453 |
Let’s fill in a little bit more of the data... Group “2” here [below] represents the second-highest group, the games when Brett was still very hot, but just not quite as hot as group “1.” Groups 26 and 27 represent the 200 games in which Brett entered the game with his lowest “batting temperature.”
|
Group |
G |
AB |
R |
H |
2B |
3B |
HR |
RBI |
BB |
SO |
SB |
CS |
GDP |
Avg |
SPct |
|
1 |
100 |
385 |
70 |
143 |
28 |
4 |
18 |
81 |
58 |
20 |
11 |
6 |
6 |
.371 |
.605 |
|
2 |
100 |
384 |
71 |
124 |
25 |
7 |
20 |
79 |
53 |
29 |
10 |
6 |
13 |
.323 |
.581 |
|
26 |
100 |
370 |
56 |
113 |
21 |
5 |
10 |
58 |
41 |
39 |
7 |
6 |
12 |
.305 |
.470 |
|
27 |
100 |
382 |
59 |
115 |
29 |
1 |
9 |
47 |
28 |
42 |
9 |
4 |
11 |
.301 |
.453 |
My first thought, when that data popped out, was, “Wow. I may have discovered some actual evidence of a hot streak there.” Brett very clearly did hit better when he entered the game hot than when he entered the game cold, but let’s not get ahead of ourselves. What other explanations for this data can we suggest?
George Brett, of course, was not exactly the same hitter throughout his
career. He hit .390 in 1980; he hit .255 in 1991. Perhaps the data could be
explained by the “good groups” having a disproportionate share of games from his
good seasons?
Well, yes and no. I tested that theory using a 162-game group of the games in
which Brett entered the game with his highest batting temperature. This is
Brett’s batting performance during those games:
|
G |
AB |
R |
H |
2B |
3B |
HR |
RBI |
BB |
CS |
SB |
CS |
GDP |
Avg |
SPct |
|
162 |
624 |
118 |
231 |
44 |
9 |
29 |
131 |
87 |
34 |
20 |
11 |
15 |
.370 |
.609 |
(For the sake of clarity, I made a very small change in the batting temperature formula after compiling this group of 162 games, before re-compiling the data listed before. The top 162 games in this listing and the top 162 in the previous listing would be extremely similar, but not identical.)
Anyway, Brett had 231 hits in these 162 games. This 162-game group contains 59
games from 1980, when Brett hit .390 and stayed hot much of the summer, and no
games at all from 1991, when Brett never got hot.
I figured Brett’s “expected” hits for each game based on his batting average for
that season and his at bats in the game, thus removing from the mix the
season-to-season fluctuation in batting skill. Conclusion? Brett could have
been expected, in these games, to get 216.6 hits, hitting .347.
He still exceeded that by a pretty fair margin. The random chance that a .347
hitter would hit .370, given 624 at bats, is 12%. It could be a random
grouping, although it doesn’t really look like it. [If students or
anyone else would like to see this calculation for themselves, you can go to the
binomial calculator
website and enter the numbers n = 624, k = 231, and p = .347.]
There could be other non-random variables in his performance which explain some
of the difference -- for example, hitting better in hot weather than in cold.
(Brett was, throughout his career, a hot weather hitter.) But on the other
hand, it could also be that, by removing these biases, we are factoring out
something that we should not be removing; if temporary variations in performance
skill do exist, then we could be improperly removing them by making these
adjustments. Brett hit .390 in 1980, whereas he never hit higher than .335 in
any other season, a huge 55-point gap between his best and second-best seasons.
It could be that Brett hit .390 in 1980 not because he was actually so
much better as a hitter that year, but simply because, in that season, he got
hot and he stayed hot. If you start seriously trying to explain how a guy
could hit that much better in one season than in any other in terms of hand/eye
co-ordination, eyesight and swing mechanics, you may realize that you’re better
off trying to explain it in terms of a hot streak.
One could argue that Brett’s performance expectations varied over his career
because of injuries, but that doesn’t fit the data. If Brett had clusters of
games in which he performed poorly because of injuries, that would cause his
performance to be below expectation in his worst games, but not
significantly above expectation in games when he was most hot. In other words,
this would cause the graph to drop off sharply on the left end, where Brett
entered the game cold, whereas it would be flat on the right end, where he
entered the game hot. The actual data is exactly the opposite: it is nearly
flat on the “cold” end, but accelerates sharply on the hot end -- as if Brett
had a limited set of games in which he was better than expected, but no
set of games in which he was significantly worse than expected. If you’ll
refer to the chart above, you will note that Brett’s performance in groups 26
and 27, when he entered the game most cold, is not really far below his career
performance norms of a .305 batting average, .487 slugging. The deviation is
on the other end.
There is another little data-sorting problem here which is a sort of variation
of the fact that you always find something in the last place you look for it.
Suppose that you focused on long hot streaks, and you asked, “What does the
player hit in the game in which he enters with his longest hitting streak of the
season?” The answer, of course, is .000, since, if the player got a hit in the
game, his hitting streak would have become one game longer. There is a similar
data-sorting problem here, which you can figure it out as well as I can.
I am willing to factor out Brett’s season-to-season variation in performance
before the analysis is done, but how can we determine objectively whether
Brett’s tendency to get hot or cold exceeds random expectation?
Suppose that we figure Brett’s “batting temperature” and “slugging temperature”
after every game of his career, as we did before. For each season, we ask
four questions:
1) What is the highest batting temperature Brett reached during the season?
2) What is the highest slugging temperature Brett reached during the season?
3) What is the lowest batting temperature Brett reached during the season?
4) What is the lowest slugging temperature Brett reached during the season?
We then randomly re-arrange the games of each season -- within the season, to avoid the problem of changes in batting ability over the course of his career -- and we ask the same four questions.
1) What is the highest batting temperature Brett reached during the season,
with his games randomly re-
arranged?
2) What is the highest slugging temperature Brett
reached during the season, with his games randomly re-
arranged?
Etc. Incidentally, the highest “slugging temperature” ever achieved by the real George Brett was .928, on August 26, 1985. Brett homered in four straight games and five out of six, batting 18 times in those six games, going 9-for-18 with two doubles and five homers.
If Brett has actual hot and cold streaks which are not just random
clusters, then his peak batting and slugging temperatures should be higher (and
lower) in real life than in the randomly sorted data. If, on the other hand,
his streaks are just random clusters, then he should get just as hot (and cold)
in the random sorts as he did in real life.
Conclusion in re Brett
The variations in Brett’s batting temperature are as extreme in the random sorts as they were in real life. However, Brett’s slugging temperature -- his power -- did vary more in real life than in the random sort. Overall, Brett’s batting performance seems more likely than not to embody some sort of hot and cold variable.
For obvious reasons, I excluded Brett’s 1973 season, when he played only 13
games, and based my study on his twenty remaining seasons, 1974 through 1993, in
all of which he played more than 100 games except 1981, when a strike limited
him to 89. About each of those 20 seasons, we had four questions, which I
outlined before: Was his highest batting temperature higher in the real season,
or in the randomly sorted data? Was his lowest slugging temperature lower in
the real season, or in the randomly sorted data? With 20 seasons and four
points of comparison for each, we thus have 80 points of comparison between
Brett’s real performance log and his randomly sorted performance.
On the first test, which I will refer to as Random Sort 1, Brett’s real-life
temperatures were more extreme than the random sort on 47 of the 80 points.
On the second test, Random Sort 2, Brett’s real-life temperatures were more
extreme than the random sort on 41 of 80 points.
On the third test, Random Sort 3, Brett’s real-life temperatures were more
extreme than the random sort on 43 of 80 points.
I repeated the test ten times. The real-life Brett was more “streaky” than the randomly sorted Brett in all ten tests. In sum, we had 800 points of comparison between Brett’s actual data and the randomly sorted data. Brett’s data was more extreme than the randomly sorted data at 453 of the 800 points.
I reported the results in only three decimals, thus wound up with a few “ties”
in the data. I counted all of the ties against the affirmative conclusion, and
thus against the “Brett” group, in favor of the “random” group. The Brett
group still “won,” 453-347. It is clear that the clusters of good games in
the real-life Brett data are stronger than in the randomly sorted data.
The difference, however, is not in the batting temperature, but in the slugging
temperature. Of the 800 points of comparison between the real-life Brett data
and the randomly sorted data, 400 have to do with batting temperature, 400 with
slugging temperature. The 400 having to do with batting temperature split
almost evenly, 197-203 (actually, 197-199 if we throw out the four ties). But
the 400 having to do with slugging temperature split 256-144 in favor of the
“Brett” group. Brett’s “slugging temperature” clearly rises and falls more, in
real life, than would be predicted by random clusters of events.
Tony Gwynn
Having come this far, I decided to repeat the experiment with the career log of Tony Gwynn [career stats here].
The career log of Gwynn, again, suggests some evidence of non-random performance
clusters. Gwynn played 54 games in 1982, 86 games in 1983, 36 games in 2000, 71
in 2001. I eliminated these four seasons from his data, and used the sixteen
seasons in between, in which Gwynn played more than 100 games each season.
Sixteen seasons provide 64 points of comparison between Gwynn’s real-life
performance and the Random Sort. Having figured out how to automate the
process a little, I did 15 Random Sorts, creating 960 points of comparison
between Gwynn’s “real” and randomly sorted performance, with no ties.
The conclusion in Gwynn’s case is not quite as clear as in the case of Brett.
Whereas Brett’s data was more prone to streaks than ANY of the ten Random Sorts,
Gwynn’s data was less streaky than the Random Sort in almost half of the tests.
However, taking all of the data, Gwynn was still more streaky than the
randomized data on 501 of the 960 points of comparison. Gwynn, unlike Brett,
was more streaky in his batting temperature than in his slugging temperature.
Gwynn’s batting temperature was more streaky than the randomized data by a count
of 255 to 225, whereas his slugging temperature was more streaky by a count of
246 to 234.
Comment
This certainly is not overwhelming evidence of a hot hand. There are, it seems to me, at least five possible explanations for the data:
1) That these two players did have non-random performance clusters,
2) That these two players had normal variations in performance, which could be explained by some other factor,
3) That there is a flaw in the design of my study, of which I have not yet been made aware,
4) That these two players are “randomly atypical,” and
5) That the data of these two superstars is slightly
non-random, but that the variations are not significant, but
merely have been made to
appear significant by the manner in which I studied them.
The difference between Brett and Gwynn’s data and the randomly sorted data is not readily apparent. The random data looks the same as the real-life data. Certainly there is no large or highly significant variation in the players' day-to-day performance level.
My study was designed to be very, very sensitive, so as to pick up extremely
small differences between random clusters and real-life data clusters. It may
well be that, in the process of making the system hyper-sensitive, I have
exaggerated the differences so much that an insignificant difference appears
significant. More study would be needed to resolve this and other issues.
About Bill James: According to Alan Schwarz , author of The Numbers Game: Baseball's Lifelong Fascination with Statistics, James is "the most influential baseball writer of the twentieth century" (p. 128). A bibliography of James's works is available here . I am honored to feature the above analysis on the pages of the hot hand website. Commentaries on the analysis are welcome. -- Alan Reifman
Click here
to return to main hot hand page