#9「江ノ島エスカー」
夏休み中にメンバーから一向に誘われず様子がおかしくなっていくひとりを見て、面々は江の島へ遊びに行くことに……
https://bocchi.rocks/story/
12/3㈯ 24:00~
TOKYO MXほかにて放送開始
URLはこちら
https://youtube.com/watch?v=iI05dQocBtI
#ぼっち・ざ・ろっく
#9「江ノ島エスカー」
夏休み中にメンバーから一向に誘われず様子がおかしくなっていくひとりを見て、面々は江の島へ遊びに行くことに……
https://bocchi.rocks/story/
12/3㈯ 24:00~
TOKYO MXほかにて放送開始
URLはこちら
https://youtube.com/watch?v=iI05dQocBtI
#ぼっち・ざ・ろっく
||◤放送開始まで4時間◢||
TVアニメ「ぼっち・ざ・ろっく!」
第10話「アフターダーク」
物語は学校……文化祭!?
TOKYO MX、BS11、群馬テレビ、とちぎテレビ
にて、本日24時より放送開始!
ABEMAは地上波同時配信
https://abema.app/wpz8
#ぼっち・ざ・ろっく
TVアニメ「ぼっち・ざ・ろっく!」
第11話「十二進法の夕景」
TOKYO MX、BS11、群馬テレビ、とちぎテレビ
にて、本日24時より放送開始!
◆MRT:25時28分~
◆MBS:27時08分~
ABEMAは地上波同時配信
https://abema.app/B2Ef
#ぼっち・ざ・ろっく
It turns out that, for the purposes of constructing lower bound benchmarks for functional estimation, it often suffices to use one-dimensional parametric submodels. A common choice of submodel for nonparametric P\mathcal{P}P is, for some mean-zero function h:Z→Rh: \mathcal{Z} \rightarrow \mathbb{R}h:Z→R,
where ∥h∥∞≤M<∞\|h\|_{\infty} \leq M<\infty∥h∥∞≤M<∞ and ϵ<1/M\epsilon<1 / Mϵ<1/M so that pϵ(z)≥0p_{\epsilon}(z) \geq 0pϵ(z)≥0. Note for this submodel the score function is ∂∂ϵlogpϵ(z)∣ϵ=0=∂∂ϵlog{1+ϵh(z)}∣ϵ=0=h(z)\left.\frac{\partial}{\partial \epsilon} \log p_{\epsilon}(z)\right|_{\epsilon=0}=\left.\frac{\partial}{\partial \epsilon} \log \{1+\epsilon h(z)\}\right|_{\epsilon=0}=h(z)∂ϵ∂logpϵ(z)ϵ=0=∂ϵ∂log{1+ϵh(z)}ϵ=0=h(z). Therefore the Cramer-Rao lower bound for some PϵP_{\epsilon}Pϵ in the example one-dimensional submodel Pϵ\mathcal{P}_{\epsilon}Pϵ above is given by
Comment: Why one-dimensional submodel? 详细的说明见 Michael Kosorok "Introduction to Empirical Processes and Semiparametric Inference" Chap. 18。
还需要说明的一点是为什么我们选择了pϵ(z)=dP(z){1+ϵh(z)}p_{\epsilon}(z)=d \mathbb{P}(z)\{1+\epsilon h(z)\}pϵ(z)=dP(z){1+ϵh(z)}作为submodel(以下内容改写自Mark van der Laan的 STAT C245B Survival Analysis and Causality 的课程材料)。
We want to define a type of differentiability of ψ:P→Rq\psi: \mathcal{P} \rightarrow \mathbb{R}^{q}ψ:P→Rq, where ψ\psiψ is the target parameter.
We could use the definition of a directional derivative in direction hhh :
However, P+ϵh\mathbb{P}+\epsilon hP+ϵh might not be a path through P\mathcal{P}P, and thus ill defined. We need to define a derivative along paths that are submodels of P\mathcal{P}P.
Let P\mathcal{P}P be nonparametric. We define a class of paths such that:
Two key assumptions necessary for it to be a proper submodel are as follows:
For ϵ∈(−δ,δ)\epsilon \in(-\delta, \delta)ϵ∈(−δ,δ) with δ=1∥h∥∞\delta=\frac{1}{\|h\|_{\infty}}δ=∥h∥∞1, this is a submodel.
To see why, first note that for the paths to be a proper density, we need:
Sketch proof:
Let h(z)h(z)h(z) be uniformly bounded and h(z)=∥h∥∞h(z)=\|h\|_{\infty}h(z)=∥h∥∞. If ϵ⩽∣δ∣,{1+ϵh(z)}⩾0\epsilon \leqslant|\delta|, \{1+\epsilon h(z)\} \geqslant 0ϵ⩽∣δ∣,{1+ϵh(z)}⩾0. Therefore, for ϵ\epsilonϵ sufficiently small and hhh uniformly bounded, dP(z){1+ϵh(z)}⩾0d \mathbb{P}(z) \{1+\epsilon h(z)\} \geqslant 0dP(z){1+ϵh(z)}⩾0.
Sketch proof:
Note that ∫{1+ϵh(z)}dP(z)=∫dP(z)+ϵ∫h(z)dP(z)=1\int\{1+\epsilon h(z)\} d \mathbb{P}(z)=\int d \mathbb{P}(z)+\epsilon \int h(z) d \mathbb{P}(z)=1∫{1+ϵh(z)}dP(z)=∫dP(z)+ϵ∫h(z)dP(z)=1 since ppp is a proper density and ∫h(z)dP(z)=EPh(z)=0\int h(z) d \mathbb{P}(z)=\mathbb{E}_{P} h(z)=0∫h(z)dP(z)=EPh(z)=0 by assumption.
Now consider the score of this submodel.
"Since any lower bound for the submodel Pϵ\mathcal{P}_{\epsilon}Pϵ is also a lower bound for P\mathcal{P}P, the best and most informative is the greatest such lower bound. Can we say anything about the best such lower bound for generic functionals and/or submodels?"
Recall the Cramer-Rao bound
for submodel Pϵ\mathcal{P}_{\epsilon}Pϵ described in the previous subsection. To find the best such lower bound, we would like to optimize the above over all PϵP_{\epsilon}Pϵ in some submodels. It is not a priori clear how generally this can be accomplished, since different functionals ψ\psiψ could yield very different numerators. Therefore let us first consider what we can say about the derivative in the numerator, for a large class of pathwise differentiable functionals.
Namely, suppose the functional ψ:P↦R\psi: \mathcal{P} \mapsto \mathbb{R}ψ:P↦R is smooth, as a map from distributions to the reals, in the sense that it admits a kind of distributional Taylor expansion
for distributions Pˉ\bar{P}Pˉ and PPP, often called a von Mises expansion, where φ(z;P)\varphi(z ; P)φ(z;P) is a mean-
zero, finite-variance function satisfying ∫φ(z;P)dP(z)=0\int \varphi(z ; P) d P(z)=0∫φ(z;P)dP(z)=0 and ∫φ(z;P)2dP(z)<∞\int \varphi(z ; P)^{2} d P(z)<\infty∫φ(z;P)2dP(z)<∞, and R2(Pˉ,P)R_{2}(\bar{P}, P)R2(Pˉ,P) is a second-order remainder term (which means it only depends on products or squares of differences between Pˉ\bar{P}Pˉ and P)P)P).
Intuitively, the von Mises expansion above is just an infinite-dimensional or distributional analog of a Taylor expansion, with φ(z;Q)\varphi(z ; Q)φ(z;Q) acting as a usual derivative term; it describes how the functional ψ\psiψ changes locally when the distribution changes from PPP to Pˉ\bar{P}Pˉ. For example, when Z∈{1,…,k}Z \in\{1, \ldots, k\}Z∈{1,…,k} is discrete and so Pˉ\bar{P}Pˉ and PPP have kkk countable components, the von Mises expansion reduces to a standard multivariate Taylor expansion with
不打算更directional derivative & pathwise derivative for functional和更多的半参理论的东西了,实在是太多了,够写一本书的。这篇文章本来也是一个偏实用的指南,还是不想太偏离主旨...
半参后面的理论确实有点复杂,会涉及一些泛函的东西,我不打算写的过于理论,更多还是intuition吧
Directional derivative & pathwise derivative for functional的东西下次再更新。
"Can the above Cramer-Rao bounds (work for smooth parametric models) be exploited to construct lower bound benchmarks for larger semi- or nonparametric models as well?"
"The standard way to connect classic Cramer-Rao bounds for parametric models to larger more complicated nonparametric models is through a technical device called the parametric submodel"
Definition 1. A parametric submodel is a smooth parametric model Pϵ={Pϵ:ϵ∈R}\mathcal{P}_{\epsilon}=\left\{P_{\epsilon}: \epsilon \in \mathbb{R}\right\}Pϵ={Pϵ:ϵ∈R} that satisfies (i) Pϵ⊆P\mathcal{P}_{\epsilon} \subseteq \mathcal{P}Pϵ⊆P, and (ii) Pϵ=0=PP_{\epsilon=0}=\mathbb{P}Pϵ=0=P.
Thus, in words, a parametric submodel is a parametric model that (i) is contained in the larger model P\mathcal{P}P of interest, and (ii) equals the true distribution at ϵ=0\epsilon=0ϵ=0, i.e., contains the truth P\mathbb{P}P.
The high-level idea behind using submodels is that it is never harder to estimate a parameter over a smaller model, relative to a larger one in which the smaller model is contained. So any lower bound for a submodel will also be a valid lower bound for the larger model P\mathcal{P}P.
Comment: 如何理解 “any lower bound for a submodel will also be a valid lower bound for the larger model P\mathcal{P}P”?这里附上van der Vaart "Asymptotic Statistics" Chap. 25 Semiparametric Models 里的讲解。
"To estimate the parameter ψ(P)\psi(P)ψ(P) given the model P\mathcal{P}P is certainly harder than to estimate this parameter given that PPP belongs to a submodel P0⊂P\mathcal{P}_{0} \subset \mathcal{P}P0⊂P. For every smooth parametric submodel P0={Pθ:θ∈Θ}⊂P\mathcal{P}_{0}=\left\{P_{\theta}: \theta \in \Theta\right\} \subset \mathcal{P}P0={Pθ:θ∈Θ}⊂P, we can calculate the Fisher information for estimating ψ(Pθ)\psi\left(P_{\theta}\right)ψ(Pθ). Then the information for estimating ψ(P)\psi(P)ψ(P) in the whole model is certainly not bigger than the infimum of the informations over all submodels (recall that The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any unbiased estimator of θ\thetaθ). We shall simply define the information for the whole model as this infimum. A submodel for which the infimum is taken (if there is one) is called least favorable or a "hardest" submodel."
Comment: (以下内容改写自Mark van der Laan的 STAT C245B Survival Analysis and Causality 的课程材料)The benchmark/lower bound in a minimax theory sense for the target parameter ψ\psiψ is tightly connected to looking at the derivative of it (the functional derivative). We are interested in the behavior of ψ\psiψ for local perturbations around P\mathbb{P}P. In particular, the derivative of ψ\psiψ and the steepness of this derivative define the difficulty of the estimation problem. 因此,我们需要一套关于functional derivative的理论。