Convergence of Power Methods
作者:
Qi Lei
最近上传:
10 年前
许可:
Creative Commons CC BY 4.0
摘要:
see title
\begin
Discover why 18 million people worldwide trust Overleaf with their work.
\begin
Discover why 18 million people worldwide trust Overleaf with their work.
%% First we have a character check
%
% ! exclamation mark " double quote
% # hash ` opening quote (grave)
% & ampersand ' closing quote (acute)
% $ dollar % percent
% ( open parenthesis ) close paren.
% - hyphen = equals sign
% | vertical bar ~ tilde
% @ at sign _ underscore
% { open curly brace } close curly
% [ open square ] close square bracket
% + plus sign ; semi-colon
% * asterisk : colon
% < open angle bracket > close angle
% , comma . full stop7
% ? question mark / forward slash
% \ backslash ^ circumflex
%
% ABCDEFGHIJKLMNOPQRSTUVWXYZ
% abcdefghijklmnopqrstuvwxyz
% 1234567890
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%\documentclass[aps,prl,reprint,twocolumn,groupedaddress]{revtex4}
\documentclass[aps,prl,superscriptaddress,amsmath,amssymb,floatfix]{revtex4}
%\documentclass[aps,prl,groupedaddress,twocolumn,showpacs,amsmath,amssymb,floatfix]{revtex4}
%\documentclass[aps,prl,preprint,groupedaddress,showpacs]{revtex4}
%\documentclass[10pt,reprint]{iopart}
\usepackage{color}
\usepackage{graphicx}
\usepackage{bm}
% You should use BibTex and apsrev.bst for references
% Choosing a journal automatically selects the correct APS
% BibTex style file (bstfile), so only uncomment the line
% below if necessary.
%\bibliographystyle{apsre}
%\pdfpagewidth=8.5in
%\pdfpageheight=11in
%\usepackage{color}
\usepackage{amsbsy}
\usepackage{amssymb}
%\usepackage{iopams}
\def\a{\alpha}
\def\b{\beta}
\def\g{\gamma}
\def\d{\delta}
\def\dd{\mbox{d}}
\def\ve{\varepsilon}
\def\e{\epsilon}
\def\n{{\bf n}}
\def\r{\rho}
\def\t{\theta}
\def\q{{\bf q}}
\def\k{{\bf k}}
\def\x{{\bf x}}
\def\y{{\bf y}}
\def\l{\ell}
\def\o{\omega}
\def\p{{\bf p}}
\def\s{\sigma}
\def\vp{\varphi}
\def\D{\Delta}
\def\E{{\bf E}}
\def\K{{\bf K}}
\def\L{{\bf L}}
\def\Pt{\tilde{P}}
\def\R{{\bf R}}
\def\S{{\bf S}}
\def\U{{\bf U}}
\def\W{{\bf W}}
%\pdfpagewidth=8.5in
%\pdfpageheight=11in
% Tibor Antal, Anatoly Kolomeisky, Blerta Shtylla, J. J. Dong, David Holcman
% Andrew Muglar, Dan Coombs, Zoltan Toroczkai, Kingshuk Ghosh
\begin{document}
\title{Convergence of Power Methods}
\author{Qi Lei}
%\affiliation{Institute of Computational Engineering and Sciences, University of Texas at Ausin, TX, 78713}
\date{\today}
\begin{abstract}
\noindent
%Particle self-assembly involving coagulation and fragmentation has
%been exclusively modeled using mean-field, mass-action equations in
%the limit of infinite total mass and unlimited, uncorrelated cluster
%sizes. Here,
Two versions of Power Method. One is the classical one, the other is with some noise.
%
%These differences depend primarily on the divisibility of the total
%mass by the maximum cluster size. When the total mass is a multiple of
%the maximum cluster size, cluster sizes first settle into a long-lived
%metastable distribution before equilibrating. The inclusion of only
%{\it one} particle will broaden the equilibrium distribution towards
%the metastable one, accelerating equilibration by a few orders of
%magnitude.
%
\end{abstract}
\pacs{02.50.Ey, 05.10.Gg, 82.30.Nr, 82.60.Nh}
\maketitle
\vspace{5mm}
%\section{Introduction}
Algorithm 1:\\
\textbf{Input:} Symmetric matrix $A\in \mathbb{R}^n$, number of iteration $L$.\\
1. Choose $x_0\in \mathbb{R}^n$.\\
2. For $l=1$ to $L$:\\
\indent(a) $~y_l\longleftarrow Ax_l$\\
\indent(b) $x_l=y_l/\| y_l\|$\\
\textbf{Output:} vector $x_L$\\
\textbf{Lemma:}
$\sigma_1\leq\sigma_2,\leq\cdots\leq\sigma_{n-1}< \sigma_n$
are the singular values of symmetric square matrix $A$. And $z_1,z_2,\cdots,z_n$
are the corresponding right eigenvectors. Denote $tan\theta_l=tan\theta(z_n,x_l)$. Then we have
$tan\theta_{l+1}\leq tan\theta_l \times \sigma_{n-1}/\sigma_n$.\\
\textit{Proof.}
Suppose $x_l=cos\theta_lz_n+sin\theta_lu_l$. $u_l\in z_n^{\bot}$.\\
Then
\begin{eqnarray*}
Ax_l &=& cos\theta_lAz_n+sin\theta_lAu_l\\
& = & cos\theta_l\sigma_nz_n+sin\theta_l\|Au_l\|\frac{Au_l}{\|Au_l\|}
\end{eqnarray*}
Suppose $u_l=\sum_{p=1}^{n-1}\alpha_pz_p$, then $Au_l=\sum_{p=0}^{n-1}\sigma_p\alpha_pz_p\in
z_n^{\bot}$, so
\begin{eqnarray*}
tan\theta_{l+1}=\frac{sin\theta_l\|Au_l\|}{cos\theta_l\sigma_n}
\end{eqnarray*}
Now $\|Au_l\|^2=\sum_{p=1}^{n-1}\sigma_p^2\alpha_p^2\leq
\max_{p=1}^{n-1}\{\sigma_p^2\}\sum_{p=1}^{n-1}\alpha_p^2\leq \sigma_{n-1}^2$.
So $tan\theta_l+1\leq tan\theta_l \frac{\sigma_{n-1}}{\sigma_n}$.
\vspace{5mm}
Algorithm 2:\\
\textbf{Input:} Symmetric matrix $A\in \mathbb{R}^n$, noise added in each step $g_l$, number of iteration $L$.\\
1. Choose $x_0\in \mathbb{R}^n$.\\
2. For $l=1$ to $L$:\\
\indent(a) $~y_l\longleftarrow Ax_l+g_l$\\
\indent(b) $x_l=y_l/\| y_l\|$\\
\textbf{Output:} vector $x_L$\\
\textbf{Lemma:}
$\sigma_1\leq\sigma_2,\leq\cdots\leq\sigma_{n-1}< \sigma_n$
are the singular values of symmetric square matrix $A$. And $z_1,z_2,\cdots,z_n$
are the corresponding right eigenvectors. Denote $tan\theta_l=tan\theta(z_n,x_l)$. $g_l$ is the noise added in each
iteration step. Then
$tan\theta_{l+1}\leq max\{tan\theta_l \times\sigma_{n-1}/\sigma_n,tan\langle z_n,g_l\rangle\}$.\\
\textit{Proof.}
Suppose $x_l=cos\theta_lz_n+sin\theta_lu_l$. $u_l\in z_n^{\bot}$.\\
Then
\begin{eqnarray*}
y_{l+1}=Ax_l+g_l &=& cos\theta_lAz_n+sin\theta_lAu_l+g_l\\
& = & cos\theta_l\sigma_nz_n+sin\theta_lAu_l+g_l.
\end{eqnarray*}
Now suppose $x_{l+1}=cos\theta_{l+1}z_n+sin\theta_{l+1}u_{l+1}$, for some $u_{l+1}\in z_n^{\bot}$.
Then
\begin{eqnarray*}
cos\theta_{l+1} &=& z_n^T x_{l+1}=(cos\theta_l\sigma_n+z_n^Tg_l)/\| y_{l+1}\|\\
sin\theta_{l+1} &=& u_{l+1}^T x_{l+1}=(sin\theta_lu_{l+1}^TAu_l+u_{l+1}^Tg_l)/\|y_{l+1}\|.\\
tan\theta_{l+1} &=& \frac{sin\theta_{l+1}}{cos\theta_{l+1}}\\
&=&\frac{sin\theta_lu_{l+1}^TAu_l+u_{l+1}^Tg_l}{cos\theta_l\sigma_n+z_n^Tg_l}\\
&\leq& \frac{sin\theta_lu_{l+1}^TAu_l+\|g_l\|sin\langle
z_n,g_l\rangle}{cos\theta_l\sigma_n+\|g_l\|cos\langle z_n,g_l\rangle}\\
&\leq& \frac{sin\theta_lu_{l+1}^TAu_l+\|g_l\|sin\langle
z_n,g_l\rangle}{cos\theta_l\sigma_n-\|g_l\||cos\langle z_n,g_l\rangle|}\\
\end{eqnarray*}
The above part is what appears in the paper and also from the webpage. So what we need to do here is to bound
both $sin\langle g_l, z_n\rangle$ and $cos\langle g_l, z_n\rangle$ from above, which means we need just
to bound $\|g_l\|$. But this is not possible in our case. So I think about change a little bit about the lemma to
the lower part.\\
\begin{eqnarray*}
tan\theta_{l+1} &\leq& \frac{sin\theta_lu_{l+1}^TAu_l+\|g_l\|sin\langle
z_n,g_l\rangle}{cos\theta_l\sigma_n+\|g_l\|cos\langle z_n,g_l\rangle}
~~~\verb|(Suppose | sin\langle z_n,g_l\rangle, cos\langle z_n,g_l\rangle \verb| are positive.)|\\
&\leq& max\{\frac{sin\theta_l\sigma_{n-1}}{cos\theta_l\sigma_n},\frac{sin\langle z_n,g_l\rangle}{cos\langle z_n,g_l\rangle}\}\\
&=& max\{tan\theta_l\frac{\sigma_{n-1}}{\sigma_n},tan\langle z_n,g_l\rangle\}
\end{eqnarray*}
\vspace{5mm}
Algorithm 2+:\\
\textbf{Input:} Symmetric matrix $A\in \mathbb{R}^n$, selected row number $r$, number of iteration $L$.\\
1. Choose $x_0\in \mathbb{R}^n, y_0=x_0$.\\
2. For $l=1$ to $L$:\\
\indent(a) $\mathcal{K}_l$ is a random subset of $\{1,2,\cdots, n\}, |\mathcal{K}_l|=r, y_l\longleftarrow y_{l-1},
y_{l,\mathcal{K}_l}\longleftarrow A_{\mathcal{K}_l}x_l$\\
\indent(b) $x_l=y_l/\| y_l\|$\\
\textbf{Output:} vector $x_L$\\
Remark: For some matrix of vector $X$, and set $\mathcal{K}\subset \{1,2,\cdots,n\}$,
\[X_{\mathcal{K}}=X_{k_1,k_2,\cdots,k_r}=
\begin{bmatrix}
x_{k_1}\\
x_{k_2}\\
\cdots\\
x_{k_r}
\end{bmatrix}\sim
\begin{bmatrix}
0\\
\cdots\\
0\\
x_{k_1}\\
0\\
\cdots\\
0\\
x_{k_2}\\
\cdots\\
x_{k_r}\\
0\\
\cdots\\
0
\end{bmatrix}
\]
Some analysis:
As in Algorithm 2, the difference between $y_{l+1}$ and $Ax_l$ could be considered as noise.
The noise $g_l$ produced by Algorithm 2+ could be denoted as
\begin{eqnarray*}
g_l&=&y_{l+1}-Ax_l\\
&=& y_l-y_{l,\mathcal{K}_l}+A_{\mathcal{K}_l}x_l-Ax_l\\
&=& (I-I_{\mathcal{K}_l})y_l+(A_{\mathcal{K}_l}-A)x_l\\
&=& (A-\|y_l\|I)_{\{n\}-\mathcal{K}_l}x_l
\end{eqnarray*}
So
\begin{eqnarray*}
tan\langle
g_l,z_n\rangle&=&\frac{\|V^T(A-\|y_l\|I)_{\{n\}-\mathcal{K}_l}x_l\|}{z_n^T(A-\|y_l\|I)_{\{n\}-\mathcal{K}_l}x_l}\\
&=& \frac{\|V_{\{n\}-\mathcal{K}_l}^T(A-\|y_l\|I)x_l\|}{z_{n,\{n\}-\mathcal{K}_l}^T(A-\|y_l\|I)x_l}
~~~\verb|(here |V=[z_1|z_2|\cdots |z_{n-1}])
\end{eqnarray*}
%\begin{figure}[t]
%\begin{center}
%\includegraphics[width=3.0in]{Fig1.eps}
%\vspace{-1mm}
%\caption{Short-time and long-time snapshots of
% coagulation/fragmentation in a closed system with total mass $M=30$
% and maximum cluster size $N=8$. The results of possible transitions
% are depicted in light red. Coagulation and fragmentation rates are
% labeled by $p_{i,j}$ and $q_{i,j}$, respectively.}
%\label{FIG1}
%\end{center}
%\end{figure}
Some observations between different optimized ways and original power method:\\
1. uniformly sampled rows\\
\indent Eventually it will converge. Intuitively, the expected performance of
each iteration is just similar to power method in the long run.\\
\indent However, it may cost a little more time. \\
2.weighted sampled rows\\
\indent The larger n is, the lesser $\lambda_1/\lambda_2$ is, the better
weighted sampling performs.\\
\indent Weight on dominant eigenvector is better than weight on the norm of
$A$.\\
\section{Matrix Completion Intuition}
\begin{eqnarray*}
f(x,y)&=&\|A-\vec{x}\vec{y}^T\|_F\\
&=&\sum_i\sum_j(a_{ij}-x_iy_j)^2\\
&=&\sum_i\|\vec{a}_i-x_i\vec{y}\|_2^2
\end{eqnarray*}
For individual $i$,
\begin{eqnarray*}
&& \|\vec{a}_i-x_i\vec{y}\|_2^2\\
&=& \|x_i\vec{y}\|^2_2-2x_i\vec{a}_i^T\vec{y}+\|\vec{a}_i\|_2^2\\
&=&\|\vec{y}\|_2^2(x_i-\frac{a_i^Ty}{\|y\|_2^2})^2+\|a_i\|_2^2-\frac{(a_i^Ty)^2}{\|y\|_2^2}
\end{eqnarray*}
Take $x_i=\frac{a_i^Ty}{\|y\|_2^2}$, then $f(x,y)$ reaches its minimum for individual $x_i,i=1,2,\cdots,n$, which is $\|a_i\|_2^2-\frac{(a_i^Ty)^2}{\|y\|_2^2}$. And $f(x,y)$ correspondingly decreases $\|\vec{y}\|_2^2(x_i-\frac{a_i^Ty}{\|y\|_2^2})^2$, written as $\Delta f_{x_i}$.\\
Likewise, for individual $y_j,~j=1,2,\cdots, n$, $f(x,y)$ reaches its minimum when we take new $y_j\doteq \frac{a_j^Tx}{\|x\|_2^2}$, and $f(x,y)$ correspondingly decreases $\|\vec{x}\|_2^2(y_j-\frac{a_j^Tx}{\|x\|_2^2})^2$, written as $\Delta f_{y_j}$.
\begin{itemize}
\item Greedy Coordinate Descent:\\
By comparing the potential decrease of $f(x,y)$, we could apply Greedy Coordinate Descent to this approach.
For each step $t$, we update $k$ entries of $x^{(t)}$ or $y^{(t)}$. Take $x^{(t)}$ as an example. $x_{\Omega}^{(t+1)}\leftarrow A_{\Omega}y^{(t)}/|y^{(t)}|^2$. Then $\Delta f^{(t+1)}_{x_{\Omega}}$ vanishes to 0. And also $\Delta f_y^{(t+1)}=\|x^{(t+1)}\|_2^2(y^{(t)}_j-\frac{a_j^Tx^{(t+1)}}{\|x^{(t+1)}\|_2^2})^2$. The whole process takes up to $4k+kn+2n$ flops.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%