Let $X=(X_1,\ldots, X_p)$ be the vector of covariates in a regression problem and let $\widetilde{X}$ be a knockoff copy of $X$ (in the sense of \citealp{CFJL18}). In a number of applications, mainly in genetics, there is a finite set $F$ such that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to make variable selection with the knockoff procedure, $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is partially exchangeable. In fact, when $X_i\in F$ for all $i$, assuming $X$ partially exchangeable is often a good strategy. In a few situations, even if extreme, it may be also reasonable to assume $X$ exchangeable. Hence, some attention is paid to the exchangeable special case. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $\pi$ of $X$, is investigated as well. Let $\mathcal{L}_\pi(\widetilde{X}\mid X=x)$ be the conditional distribution of $\widetilde{X}$, given $X=x$, when $X$ is exchangeable and the de Finetti's measure of $X$ is $\pi$. It is shown that $\norm{\mathcal{L}_{\pi_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{\pi_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{\pi_1-\pi_2}$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.

Knockoffs for partially exchangeable categorical covariates / Emanuela Dreassi; Luca Pratelli; Pietro Rigo. - In: STATISTICAL METHODS & APPLICATIONS. - ISSN 1618-2510. - STAMPA. - --:(2025), pp. 1-31. [10.1007/s10260-025-00827-8]

Knockoffs for partially exchangeable categorical covariates

Emanuela Dreassi;
2025

Abstract

Let $X=(X_1,\ldots, X_p)$ be the vector of covariates in a regression problem and let $\widetilde{X}$ be a knockoff copy of $X$ (in the sense of \citealp{CFJL18}). In a number of applications, mainly in genetics, there is a finite set $F$ such that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to make variable selection with the knockoff procedure, $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is partially exchangeable. In fact, when $X_i\in F$ for all $i$, assuming $X$ partially exchangeable is often a good strategy. In a few situations, even if extreme, it may be also reasonable to assume $X$ exchangeable. Hence, some attention is paid to the exchangeable special case. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $\pi$ of $X$, is investigated as well. Let $\mathcal{L}_\pi(\widetilde{X}\mid X=x)$ be the conditional distribution of $\widetilde{X}$, given $X=x$, when $X$ is exchangeable and the de Finetti's measure of $X$ is $\pi$. It is shown that $\norm{\mathcal{L}_{\pi_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{\pi_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{\pi_1-\pi_2}$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.
2025
--
1
31
Emanuela Dreassi; Luca Pratelli; Pietro Rigo
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1442254
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact