New Evidence on Mutual Fund Performance: A Comparison of Alternative Bootstrap Methods

David P. Blake

City University London – Cass Business School – The Pensions Institute

Tristan Caulfield

University College London

Christos Ioannidis

University of Bath-Department of Economics

Ian Tonks

University of Bath School of Management

October 1, 2015


We compare two bootstrap methods for assessing mutual fund performance. Kosowski, Timmermann, Wermers and White (2006) produces narrow confidence intervals due to pooling over time, while Fama and French (2010) produces wider confidence intervals because it preserves the cross-correlation of fund returns. We then show that the average UK equity mutual fund manager is unable to deliver outperformance net of fees under either bootstrap. Gross of fees, 95% of fund managers on the basis of the first bootstrap and all fund managers on the basis of the second bootstrap fail to outperform the luck distribution of gross returns.

New Evidence On Mutual Fund Performance: A Comparison Of Alternative Bootstrap Methods – Introduction

Evidence collected over an extended period on the performance of (open-ended) mutual funds in the US (Jensen (1968), Malkiel (1995), Wermers, Barras and Scaillet (2010)) and unit trusts and open-ended investment companies (OEICs) in the UK (Blake and Timmermann (1998), Lunde, Timmermann and Blake (1999)) has found that on average  a fund manager cannot outperform the market benchmark and that any outperformance is more likely to be due to luck rather than skill.

More recently, Kosowski, Timmermann, Wermers and White (2006, hereafter KTWW) reported that the time series returns of individual mutual funds typically exhibit non-normal distributions. They argued that this finding has important implications for the luck versus skill debate and that there was a need to re-examine the statistical significance of mutual fund manager performance using bootstrap techniques. They applied a bootstrap methodology (Efron and Tibshirani (1993), Politis and Romano (1994)) that creates a sample of monthly pseudo excess returns by randomly re-sampling residuals from a factor benchmark model and imposing a null of zero abnormal performance.3 Following the bootstrap exercise, KTWW determine how many funds from a large group one would expect to observe having large alphas by luck and how many are actually observed. Using data on 1,788 US mutual funds over the period January 1975–December 2002, they show that, by luck alone, 9 funds would be expect to achieve an annual alpha of 10% over a five-year period. In fact, 29 funds achieve this hurdle: “this is sufficient, statistically, to provide overwhelming evidence that some fund managers have superior talent in picking stocks. Overall, our results provide compelling evidence that, net of all expenses and costs (except load charges and taxes), the superior alphas of star mutual fund managers survive and are not an artifact of luck” (p. 2553).

Applying the same bootstrap method to 935 UK equity unit trusts and OEICs between April 1975–December 2002, Cuthbertson, Nitzche and O’Sullivan (2008) find similar evidence of significant stock picking ability amongst a small number of top-performing fund managers. Blake, Rossi, Timmermann, Tonks and Wermers (2013) show that fund manager performance improves if the degree of decentralization – in the form of increasing specialization – is increased.

However, these results have been challenged by Fama and French (2010, hereafter FF) who suggest an alternative bootstrap method which preserves any contemporaneously correlated movements in the volatilities of the explanatory factors in the benchmark model and the residuals. They calculate the Jensen alpha for each fund, and then compute pseudo returns by deducting the Jensen alpha from the actual returns to obtain benchmark-adjusted (zero-alpha) returns, thereby maintaining the cross-sectional relationship between the factor and residual volatilities (i.e., between the explained and unexplained components of returns). Their sample consists of 5,238 US mutual funds over the period January 1984–September 2006, and following their bootstrap calculations, they conclude that there is little evidence of mutual fund manager skills.

There are three differences between the KTWW and FF studies. First, while both studies use data for US domestic equity mutual funds, KTWW use data from 1975-2002, whereas the dataset in FF covers the more recent 1984-2006 period. Second, the studies use different fund inclusion criteria: KTWW restrict their sample to funds that have a minimum of 60 monthly observations, whereas FF restrict theirs to funds that have a minimum of 8 monthly observations Third and most important, with respect to the bootstrap method used, for each bootstrap simulation, the former simulate fund returns and factor returns independently of each other, whereas the latter simulate these returns jointly.

Mutual Fund Performance

Mutual Fund Performance

See full PDF below.