-
Figure 1.
Huber loss function (a) and Berhu penalty function (b); The 2D contours of Huber loss function (c) and Berhu penalty function (d).
-
Figure 2.
Estimation picture for the Huber-Berhu regression (a) when least absolute shrinkage and selection operator (LASSO) (b) and ridge (c) regressions are used as a comparison.
-
Figure 3.
Comparison of running time for Algorithm 1 and CVX.
is the number of independent variables in TF-matrix ($p $ ).$X $ -
Figure 4.
The implementation of Huber-Berhu-Partial Least Squares (HB-PLS) to identify candidate regulatory genes controlling lignin biosynthesis pathway. (a) HB-PLS; (b) SPLS. Green nodes (inside the circles) represent lignin biosynthesis genes. Coral nodes represent positive lignin pathway regulators supported by existing literature, and shallow purple nodes contain other predicted transcription factors that are not supported by current available literature. (c) The lignin biosynthesis pathway.
-
Figure 5.
The implementation of Huber-Berhu-Partial Least Squares (HB-PLS) to identify candidate regulatory genes (purple and coral nodes) controlling photosynthesis and related pathway genes. (a) was compared with the sparse partial least squares (SPLS) method (b) in identifying regulators that affects maize photosynthesis light reaction and Calvin cycle pathway genes. The green and yellow nodes within the cycles represent photosynthesis light reaction pathway genes and Calvin cycle pathway genes, respectively. Coral nodes in the circles represent positive predicted biological process or pathway regulators that are supported by existing literature, and shallow purple nodes contain other predicted TFs that do not have experimentally validated supporting evidence at present.
-
Figure 6.
The receiver operating characteristic (ROC) curves of Huber-Berhu-partial least squares (HB-PLS) and sparse partial least squares (SPLS) methods for identifying pathway regulators in Arabidopsis thaliana. (a) Lignin biosynthesis pathway; (b) a merged pathway of light reaction pathway and Calvin cycle pathway.
-
Figure 7.
An integrative framework for identifying biological process and pathway regulators from high-throughput gene expression data by integration of statistics, machine learning and convex optimization. PLS: Partial least squares.
-
Algorithm 1: Accelerated proximal gradient descent method to minimize in equation (7) respected to$ f\left({\boldsymbol{\beta }}\right) $ and$ {\beta }_{0} $ ${\boldsymbol{\beta}}$ Input: predictor matrix ( ), dependent vector ($X $ ), and penalty constant ($y $ )$ {\boldsymbol{\lambda}}$ Output: regression coefficient ( )$ {\boldsymbol{\beta }} $ 1 Initiate ,$ {\boldsymbol{\beta }}={\bf{0}} $ = 1,$\boldsymbol{t}$ $ {{\boldsymbol{\beta }}}_{\boldsymbol{p}\boldsymbol{r}\boldsymbol{e}\boldsymbol{v}}={\bf{0}} $ 2 For in 1… MAX_ITER$k $ 3 $v={\boldsymbol{\beta }}+\left(k/\left( {k + 3} \right)\right)\boldsymbol{*}\left({\boldsymbol{\beta }}-{{\boldsymbol{\beta }}}_{\boldsymbol{p}\boldsymbol{r}\boldsymbol{e}\boldsymbol{v}}\right)$ 4 compute the gradient of Huber loss at using (5), denoted as$ v $ $ {\boldsymbol{G}}_{\boldsymbol{v}} $ 5 while TRUE 6 compute using (10)$ {\boldsymbol{p}}_{1}={\boldsymbol{P}\boldsymbol{r}\boldsymbol{o}\boldsymbol{x}}_{\boldsymbol{t},\boldsymbol{\lambda }\left|\cdot \right|}\left(\boldsymbol{v}\right) $ 7 compute using (9)${\boldsymbol{p}}_{2}={\boldsymbol{P}\boldsymbol{r}\boldsymbol{o}\boldsymbol{x}}_{\boldsymbol{t},\boldsymbol{\lambda }\boldsymbol{u}}\left(\boldsymbol{p}_1\right)$ 8 if ${\bf\sum }_{i=1}^{n}{\boldsymbol{H}}_{\boldsymbol{M}}\left({\boldsymbol{y}}_{\boldsymbol{i}} -{\boldsymbol{\beta}}_{\boldsymbol{0}}- {\boldsymbol{x}}_{\boldsymbol{i}}^{\boldsymbol{T}}{\boldsymbol{p}}_{2}\right)\le {\sum }_{i=1}^{n}{\boldsymbol{H}}_{\boldsymbol{M}}\left({\boldsymbol{y}}_{\boldsymbol{i}} -{\boldsymbol{\beta}}_{\boldsymbol{0}}- {\boldsymbol{x}}_{\boldsymbol{i}}^{\boldsymbol{T}}\boldsymbol{v}\right)+$ ${\boldsymbol{G}}_{\boldsymbol{v}}'({\boldsymbol{p}}_{\bf 2} -\boldsymbol{v})+ \frac{\bf 1}{\bf 2\boldsymbol{t}}{\left|\right|{\boldsymbol{p}}_{\bf 2}-\boldsymbol{v}\left|\right|}_{\bf 2}^{\bf 2}$ 9 break 10 else $ t=t*0.5 $ 11 ,$ {{\boldsymbol{\beta }}}_{\boldsymbol{p}\boldsymbol{r}\boldsymbol{e}\boldsymbol{v}}={\boldsymbol{\beta }} $ $ {\boldsymbol{\beta }}={\boldsymbol{p}}_{2} $ 12 if converged 13 break -
Algorithm 2: Finding the solution of the Huber-Berhu PLS regression Input: TF matrix ( ), pathway matrix ($ X $ ), penalty constant ($ Y $ ), and number of components (${\boldsymbol{\lambda}}$ )$ K $ Output: regression coefficient matrix ( )$ A $ 1 ,$ {\boldsymbol{X}}_{0}=\boldsymbol{X},{\boldsymbol{X}}_{0}={\boldsymbol{Y}} $ ,$ {\boldsymbol{c}\boldsymbol{F}}={\boldsymbol{I}} $ $ {\boldsymbol{A}}={\bf{0}} $ 2 For in 1,...,$ k $ $K $ 3 set $ {\boldsymbol{M}}_{\boldsymbol{k}-1}={\boldsymbol{X}}_{\boldsymbol{k}-1}'{\boldsymbol{Y}}_{\boldsymbol{k}-1} $ 4 Initialize to be the first left singular vector and initialize$ {\boldsymbol{u} }$ to be the product of first right singular vectors and first singular value.$ {\boldsymbol{v}} $ 5 until convergence of and$ {\boldsymbol{u}} $ $ {\boldsymbol{v}} $ 6 update using (16)${ \boldsymbol{u}} $ 7 update using (17)$ {\boldsymbol{v}} $ 8 extract component $ { {{ξ}}} ={\boldsymbol{X}\boldsymbol{u}} $ 9 compute regression coefficients in (8) ${\boldsymbol c}={\boldsymbol X}'{ {{ξ}}}/({ {{ξ}}}'{ {{ξ}}}), \;{\boldsymbol d}={\boldsymbol Y}'{ {{ξ}}}/$ $({ {{ξ}}}'{ {{ξ}}}) $ 10 update $\boldsymbol{A}=\boldsymbol{A}+\boldsymbol{c}\boldsymbol{F}\cdot \boldsymbol{u}\cdot \boldsymbol{d}'$ 11 update )${\boldsymbol{c}\boldsymbol{F}}={\boldsymbol{c}\boldsymbol{F}}\cdot ({\bf{I}}-{\boldsymbol{u}}\cdot {\boldsymbol{c}}'$ 12 compute residuals for and$X $ ,$ Y$ ,${\boldsymbol{X}}={\boldsymbol{X}}- { {{ξ}}}{\boldsymbol c}'$ $ {\boldsymbol{Y}}= {\boldsymbol{Y}}- { {{ξ}}}{\boldsymbol d}$
Figures
(7)
Tables
(2)