Application of bootstrap in estimation of some measures of accuracy and bandwidth selection of a product kernel density estimator

Santanu Dutta

Abstract

Product kernel density estimator is an important nonparametric data analytic tool. It is useful in detecting important features such as peaks and valley of a multivariate density (see for instance Chaudhuri and Marron (2000), Abraham, Biau and Cadre (2003)). In general the amount of theoretical research on multivariate kernel density estimators has been quite less in comparison to the amount of work focused on univariate density estimators (see Sain, Baggerly and Scott (1994)) and the problem of estimating bias, mean squared error, mean integrated squared error of a product kernel density estimator appear to be somewhat neglected so far. We propose smooth bootstrap estimators of bias, variance, mean squared error and mean integrated squared error (MISE) of a multivariate product kernel density estimator K_n(.), based on n i.i.d. Rd valued random variables X1 ,.....,Xn. We obtain L1 and L2 rates at which A*n /An converges to one, A*n represents the proposed estimator of An which is one of these measures of accuracy of K_n(.). The performance of K_n(.) crucially depends on d parameters hi , i=2,..,d, which represent the amount of smoothing along d coordinate directions. A simple option is to use h1 =h2 =?..=hd =h (Cacoullos (1966)). We address the problem of data based choice of h. We provide insight into how well our proposed bandwidth selection rule succeeds in minimizing the MISE and compare its performance with the well known cross validation rule using simulation and real data.