Evidence of Non-Random Clusters in the
Career Performance of George Brett and Tony Gwynn

 

by Bill James, Guest Contributor

 

How “hot” is a hitter?  How do you measure how “hot” a hitter is, at the moment?
 

While examining that question, I may have found some fragmentary evidence that hitters -- well, two hitters -- actually do have non-random clusters in performance.  Let me explain the process at which I arrived at that belief.

 

When a hitter is on a hot streak, how hot is he?  Call it the Johnny Carson question:  How hot was he?  I’m not debating, for the moment, the question of whether hot streaks are real or illusory.  I am looking for a way to measure the effect, regardless of whether it is a real effect or a transitory perception.


How hot a hitter is -- or how cold -- can be measured in this way.  Suppose that you start every hitter out every season with 40 at bats and 10 hits, a .250 average, on the theory that, at .250, no hitter may be said to be either hot or cold.   


After each game, the player’s “momentary at bats” and “momentary hits” are re-calculated as:

 

The previous day’s figure

Times .900

Plus the data from the most recent game. 

 

To illustrate the system, let us suppose that the hitter starts out the season “cold,” in a slump.  On opening day he goes 0-for-4.   His “batting temperature” drops to .225:

 

10 hits

Times .900

Equals 9.00

Plus zero

Equals 9.00 hits.

 

40 at bats

Times .900

Equals 36.00

Plus 4

Equals 40.00 at bats.

 

Nine divided by 40 equals .225, so his batting temperature is .225.  By this process, 10% of his temperature is based on what he did in the last game, 52% is based on what he has done in the last seven games, and 90% is based on what he has done in the last 22 games.  When a player gets cold, his batting temperature drops rapidly; when he gets hot, it shoots up rapidly.  


Let us suppose that this player, after going 0-for-4 on opening day, then goes 1-for-5 the second day.  His batting temperature would drop to .222 (9.1 divided by 41.0).  Let us suppose that he then goes 0-for-3, 1-for-3, 0-for-5, 0-for-4, and 0-for-3 -- a realistic type of slump.  His batting temperature, after those seven games, would drop to .156. 


Suppose, however, that he then gets hot, going 2-for-4, 1-for-3, 3-for-4, 2-for-4, 2-for-2, and 3-for-4 over the next six games.  His batting temperature would then be .370 (13.7 divided by 37.0).  It is a deliberately unstable measure of hitting ability, designed to swing relatively freely based on the player’s most recent performance. 

 

I was looking for some way to measure how “hot” a hitter was, because... well, this is what I do for a living.  To tinker with the system, I was using the career day-by-day performance log of George Brett, which I keep handy to console myself whenever the Royals fall into a deep slump [Brett's Hall of Fame webpage is available here, his career stats here, and a retrospective by the Kansas City Star, here].   The hottest George Brett ever was in his entire career, by the way, was on August 18, 1980.   At that moment Brett:

 

a)  had hit in 30 consecutive games,

b)  had averaged almost two hits a game over that entire 30-game stretch (57 hits),

c)  had had 9 multiple-hit games in 13 days, and

d)  had finished that off by going 3-for-4, 4-for-4 (with 5 RBI) and 3-for-5, a .769 average over the last three
     days. 

 

His batting temperature at that moment was .525.   


The coldest that George Brett ever was was on September 8, 1982.  After carrying a .306 batting temperature on August 22 of that season, Brett fell into a horrific two-and-a-half week slump during which he went 5-for-61, ending with four consecutive hitless games.  His batting temperature fell to .125.  


I figured George Brett’s batting temperature, and also his slugging temperature, figured in the same way except with total bases replacing hits, after every game of Brett’s career.   (We start the player out each season with 14 total bases, a baseline slugging percentage of .350.)  It then occurred to me to pose a simple question:  Does Brett in fact hit better when he starts the game “hot” than when he begins the game “cold”?


I had expected, based on previous studies, that Brett would hit no better when he was “hot” than when he was “cold.”  In fact, though, he did.  The chart below shows, on the top (red) line, what Brett hit in his career in the 100 games when he entered the game most “hot,” and the bottom (blue) line reflects what he hit when he entered the game most “cold.” 

 

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

CS

GDP

  Avg

SPct

100

385

70

143

28

4

18

81

58

20

11

6

6

.371

.605

100

382

59

115

29

1

9

47

28

42

9

4

11

.301

.453

 

Let’s fill in a little bit more of the data...  Group “2” here [below] represents the second-highest group, the games when Brett was still very hot, but just not quite as hot as group “1.” Groups 26 and 27 represent the 200 games in which Brett entered the game with his lowest “batting temperature.” 

 

Group

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

CS

GDP

  Avg

SPct

1

100

385

70

143

28

4

18

81

58

20

11

6

6

.371

.605

2

100

384

71

124

25

7

20

79

53

29

10

6

13

.323

.581

26

100

370

56

113

21

5

10

58

41

39

7

6

12

.305

.470

27

100

382

59

115

29

1

9

47

28

42

9

4

11

.301

.453

 

My first thought, when that data popped out, was, “Wow.  I may have discovered some actual evidence of a hot streak there.”  Brett very clearly did hit better when he entered the game hot than when he entered the game cold, but let’s not get ahead of ourselves.  What other explanations for this data can we suggest?


George Brett, of course, was not exactly the same hitter throughout his career.   He hit .390 in 1980; he hit .255 in 1991.  Perhaps the data could be explained by the “good groups” having a disproportionate share of games from his good seasons?


Well, yes and no.  I tested that theory using a 162-game group of the games in which Brett entered the game with his highest batting temperature.  This is Brett’s batting performance during those games:

 

G

AB

R

H

2B

3B

HR

RBI

BB

CS

SB

CS

GDP

  Avg

SPct

162

624

118

231

44

9

29

131

87

34

20

11

15

.370

.609

 

(For the sake of clarity, I made a very small change in the batting temperature formula after compiling this group of 162 games, before re-compiling the data listed before.   The top 162 games in this listing and the top 162 in the previous listing would be extremely similar, but not identical.) 


Anyway, Brett had 231 hits in these 162 games.  This 162-game group contains 59 games from 1980, when Brett hit .390 and stayed hot much of the summer, and no games at all from 1991, when Brett never got hot.  


I figured Brett’s “expected” hits for each game based on his batting average for that season and his at bats in the game, thus removing from the mix the season-to-season fluctuation in batting skill.  Conclusion?   Brett could have been expected, in these games, to get 216.6 hits, hitting .347.  


He still exceeded that by a pretty fair margin.  The random chance that a .347 hitter would hit .370, given 624 at bats, is 12%.  It could be a random grouping, although it doesn’t really look like it.  [If students or anyone else would like to see this calculation for themselves, you can go to the binomial calculator website and enter the numbers n = 624, k = 231, and p = .347.]  


There could be other non-random variables in his performance which explain some of the difference -- for example, hitting better in hot weather than in cold.  (Brett was, throughout his career, a hot weather hitter.)   But on the other hand, it could also be that, by removing these biases, we are factoring out something that we should not be removing; if temporary variations in performance skill do exist, then we could be improperly removing them by making these adjustments.  Brett hit .390 in 1980, whereas he never hit higher than .335 in any other season, a huge 55-point gap between his best and second-best seasons.  It could be that Brett hit .390 in 1980 not because he was actually so much better as a hitter that year, but simply because, in that season, he got hot and he stayed hot.   If you start seriously trying to explain how a guy could hit that much better in one season than in any other in terms of hand/eye co-ordination, eyesight and swing mechanics, you may realize that you’re better off trying to explain it in terms of a hot streak.  


One could argue that Brett’s performance expectations varied over his career because of injuries, but that doesn’t fit the data.   If Brett had clusters of games in which he performed poorly because of injuries, that would cause his performance to be below expectation in his worst games, but not significantly above expectation in games when he was most hot.   In other words, this would cause the graph to drop off sharply on the left end, where Brett entered the game cold, whereas it would be flat on the right end, where he entered the game hot.  The actual data is exactly the opposite:  it is nearly flat on the “cold” end, but accelerates sharply on the hot end -- as if Brett had a limited set of games in which he was better than expected, but no set of games in which he was significantly worse than expected.   If you’ll refer to the chart above, you will note that Brett’s performance in groups 26 and 27, when he entered the game most cold, is not really far below his career performance norms of a .305 batting average, .487 slugging.   The deviation is on the other end. 


There is another little data-sorting problem here which is a sort of variation of the fact that you always find something in the last place you look for it.   Suppose that you focused on long hot streaks, and you asked, “What does the player hit in the game in which he enters with his longest hitting streak of the season?”  The answer, of course, is .000, since, if the player got a hit in the game, his hitting streak would have become one game longer.   There is a similar data-sorting problem here, which you can figure it out as well as I can.


I am willing to factor out Brett’s season-to-season variation in performance before the analysis is done, but how can we determine objectively whether Brett’s tendency to get hot or cold exceeds random expectation?


Suppose that we figure Brett’s “batting temperature” and “slugging temperature” after every game of his career, as we did before.  For each season, we ask four questions:

 

1)  What is the highest batting temperature Brett reached during the season?

2)  What is the highest slugging temperature Brett reached during the season?

3)  What is the lowest batting temperature Brett reached during the season?

4)  What is the lowest slugging temperature Brett reached during the season?

 

We then randomly re-arrange the games of each season -- within the season, to avoid the problem of changes in batting ability over the course of his career -- and we ask the same four questions. 


1)  What is the highest batting temperature Brett reached during the season, with his games randomly re-
     arranged?

2)  What is the highest slugging temperature Brett reached during the season, with his games randomly re-
     arranged?

           

Etc.  Incidentally, the highest “slugging temperature” ever achieved by the real George Brett was .928, on August 26, 1985.  Brett homered in four straight games and five out of six, batting 18 times in those six games, going 9-for-18 with two doubles and five homers.  


If Brett has actual hot and cold streaks which are not just random clusters, then his peak batting and slugging temperatures should be higher (and lower) in real life than in the randomly sorted data.  If, on the other hand, his streaks are just random clusters, then he should get just as hot (and cold) in the random sorts as he did in real life.  

 

Conclusion in re Brett

 

The variations in Brett’s batting temperature are as extreme in the random sorts as they were in real life.   However, Brett’s slugging temperature -- his power -- did vary more in real life than in the random sort.   Overall, Brett’s batting performance seems more likely than not to embody some sort of hot and cold variable. 


For obvious reasons, I excluded Brett’s 1973 season, when he played only 13 games, and based my study on his twenty remaining seasons, 1974 through 1993, in all of which he played more than 100 games except 1981, when a strike limited him to 89.  About each of those 20 seasons, we had four questions, which I outlined before:  Was his highest batting temperature higher in the real season, or in the randomly sorted data?   Was his lowest slugging temperature lower in the real season, or in the randomly sorted data?  With 20 seasons and four points of comparison for each, we thus have 80 points of comparison between Brett’s real performance log and his randomly sorted performance.  


On the first test, which I will refer to as Random Sort 1, Brett’s real-life temperatures were more extreme than the random sort on 47 of the 80 points.  


On the second test, Random Sort 2, Brett’s real-life temperatures were more extreme than the random sort on 41 of 80 points.


On the third test, Random Sort 3, Brett’s real-life temperatures were more extreme than the random sort on 43 of 80 points.

 

I repeated the test ten times.   The real-life Brett was more “streaky” than the randomly sorted Brett in all ten tests.   In sum, we had 800 points of comparison between Brett’s actual data and the randomly sorted data.   Brett’s data was more extreme than the randomly sorted data at 453 of the 800 points.  


I reported the results in only three decimals, thus wound up with a few “ties” in the data.   I counted all of the ties against the affirmative conclusion, and thus against the “Brett” group, in favor of the “random” group.   The Brett group still “won,” 453-347.  It is clear that the clusters of good games in the real-life Brett data are stronger than in the randomly sorted data.  


The difference, however, is not in the batting temperature, but in the slugging temperature.   Of the 800 points of comparison between the real-life Brett data and the randomly sorted data, 400 have to do with batting temperature, 400 with slugging temperature.   The 400 having to do with batting temperature split almost evenly, 197-203 (actually, 197-199 if we throw out the four ties).  But the 400 having to do with slugging temperature split 256-144 in favor of the “Brett” group.  Brett’s “slugging temperature” clearly rises and falls more, in real life, than would be predicted by random clusters of events.
 

Tony Gwynn

 

Having come this far, I decided to repeat the experiment with the career log of Tony Gwynn [career stats here]


The career log of Gwynn, again, suggests some evidence of non-random performance clusters.  Gwynn played 54 games in 1982, 86 games in 1983, 36 games in 2000, 71 in 2001.   I eliminated these four seasons from his data, and used the sixteen seasons in between, in which Gwynn played more than 100 games each season. 


Sixteen seasons provide 64 points of comparison between Gwynn’s real-life performance and the Random Sort.   Having figured out how to automate the process a little, I did 15 Random Sorts, creating 960 points of comparison between Gwynn’s “real” and randomly sorted performance, with no ties.


The conclusion in Gwynn’s case is not quite as clear as in the case of Brett.  Whereas Brett’s data was more prone to streaks than ANY of the ten Random Sorts, Gwynn’s data was less streaky than the Random Sort in almost half of the tests. 


However, taking all of the data, Gwynn was still more streaky than the randomized data on 501 of the 960 points of comparison.  Gwynn, unlike Brett, was more streaky in his batting temperature than in his slugging temperature.   Gwynn’s batting temperature was more streaky than the randomized data by a count of 255 to 225, whereas his slugging temperature was more streaky by a count of 246 to 234. 

           

Comment

 

This certainly is not overwhelming evidence of a hot hand.  There are, it seems to me, at least five possible explanations for the data:

 

1)  That these two players did have non-random performance clusters,

2)  That these two players had normal variations in performance, which could be explained by some other factor,

3)  That there is a flaw in the design of my study, of which I have not yet been made aware,

4)  That these two players are “randomly atypical,” and

5)  That the data of these two superstars is slightly non-random, but that the variations are not significant, but
      merely have been made to appear significant by the manner in which I studied them. 

 

The difference between Brett and Gwynn’s data and the randomly sorted data is not readily apparent.  The random data looks the same as the real-life data.  Certainly there is no large or highly significant variation in the players' day-to-day performance level. 


My study was designed to be very, very sensitive, so as to pick up extremely small differences between random clusters and real-life data clusters.  It may well be that, in the process of making the system hyper-sensitive, I have exaggerated the differences so much that an insignificant difference appears significant.  More study would be needed to resolve this and other issues.

 


About Bill James:  According to Alan Schwarz , author of The Numbers Game: Baseball's Lifelong Fascination with Statistics, James is "the most influential baseball writer of the twentieth century" (p. 128).  A bibliography of James's works is available here .  I am honored to feature the above analysis on the pages of the hot hand website.  Commentaries on the analysis are welcome.  -- Alan Reifman



Click here to return to main hot hand page