Kernel density estimators are a popular family of non-parametric estimators with applications to exploratory statistics and data mining. Since kernel estimators must be constructed from the data, if the data are sensitive, only indirect representations of the estimate, such as graphs or tabulations, can be stored or transmitted. However, even such representations might contain enough information to allow for data reconstruction, yielding an inference problem for kernel estimates. The inference problem for kernel estimators can be described by a system of nonlinear equations that arises naturally from the kernel estimate of a multi-variate dataset. The solution to the system is the set of data from which the kernel estimate was computed and, in practice a good approximation to the solution is not available. A serious threat to data privacy is posed by publicly available solvers for nonlinear systems. This paper investigates the numerical solution of the nonlinear systems arising from the kernel estimate of a multivariate dataset and shows that this task is challenging. In fact, the Jacobian matrix of the system is numerically singular and a large number of solvers for nonlinear equations fails as they have to solve linear systems whose coefficient matrix is given by the Jacobian. Further, up to date solvers for optimization problems that do not suffer from this drawback may fail to solve the non-linear system. To show this fact, we tested a subspace trust-region method, a BFGS method and a gradient projection method on both a synthetic and a real dataset. These methods are able to find a solution to the optimization problem even starting far from it. However, the experimental results on both the synthetic and the real dataset show that, if the initial guess is not very close to the solution, all three methods fail to converge to a solution of the system of equations. Then, unless a very good approximation of the solution is known, the dataset cannot be reconstructed by using publicly available solvers.

Inferences on Kernel density estimates by solving nonlinear systems / S. Bellavia; S. Lodi; B. Morini. - STAMPA. - (2006), pp. 389-397. (Intervento presentato al convegno 18th International Conference on Scientific and Statistical Database Management, 2006) [10.1109/SSDBM.2006.30].

Inferences on Kernel density estimates by solving nonlinear systems

BELLAVIA, STEFANIA;MORINI, BENEDETTA
2006

Abstract

Kernel density estimators are a popular family of non-parametric estimators with applications to exploratory statistics and data mining. Since kernel estimators must be constructed from the data, if the data are sensitive, only indirect representations of the estimate, such as graphs or tabulations, can be stored or transmitted. However, even such representations might contain enough information to allow for data reconstruction, yielding an inference problem for kernel estimates. The inference problem for kernel estimators can be described by a system of nonlinear equations that arises naturally from the kernel estimate of a multi-variate dataset. The solution to the system is the set of data from which the kernel estimate was computed and, in practice a good approximation to the solution is not available. A serious threat to data privacy is posed by publicly available solvers for nonlinear systems. This paper investigates the numerical solution of the nonlinear systems arising from the kernel estimate of a multivariate dataset and shows that this task is challenging. In fact, the Jacobian matrix of the system is numerically singular and a large number of solvers for nonlinear equations fails as they have to solve linear systems whose coefficient matrix is given by the Jacobian. Further, up to date solvers for optimization problems that do not suffer from this drawback may fail to solve the non-linear system. To show this fact, we tested a subspace trust-region method, a BFGS method and a gradient projection method on both a synthetic and a real dataset. These methods are able to find a solution to the optimization problem even starting far from it. However, the experimental results on both the synthetic and the real dataset show that, if the initial guess is not very close to the solution, all three methods fail to converge to a solution of the system of equations. Then, unless a very good approximation of the solution is known, the dataset cannot be reconstructed by using publicly available solvers.
2006
Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
18th International Conference on Scientific and Statistical Database Management, 2006
S. Bellavia; S. Lodi; B. Morini
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/315093
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact