Figures (5)  Tables (1)
    • Figure 1. 

      Workflow of GFAnno. The local of 'seed sequence' file, 'HMM Model file' and the values of parameters (b_iden, b_qcov, b_tcov, and h_cov) were obtained from the config file.

    • Figure 2. 

      Parameter filtering workflow. (a) BLASTP seed sequence generation. (b) HMM model selection. (c) Parameter selection. (d) Parameter validation. Species collections in each step are marked with a single asterisk (*), and output data is marked with double asterisks (**), which are provided in github.

    • Figure 3. 

      Plant flavonoid biosynthesis pathway. Enzymes labeled with '*' indicate members of CYP450s, whereas genes labeled with '**' represent members of 2OGDs. Solid-line arrows indicate well-established mechanisms for the corresponding enzymatic reactions, and dashed-line arrows represent pathways where the mechanistic details are yet to be fully determined.

    • Figure 4. 

      Overview of F3H, ANS, LDOX, and FLS in 2OGD superfamily. (a) The parameter selection for candidate genes in F3H, ANS, LDOX, and FLS. The phylogenetic tree, four parameters, and conservation module diagram to illustrate the selection process. In the phylogenetic tree, genes selected for are represented in blue-gray, while parameters that have been excluded are marked in red. The conserved domains were created using MEME[21], and the HMM models used in HMMsearch are outlined in black boxes, with the model length displayed in parentheses following the model's name. (b) Neighbor join tree shows the distance relationships between four genes.

    • Figure 5. 

      Overview of F3'H, F3'5'H, FNSII and C4H in CYP450 superfamily. (a) The parameter selection for candidate genes. The phylogenetic tree, four parameters, and conservation module diagram to illustrate the selection process. In the phylogenetic tree, genes selected for are represented in blue-gray, while parameters that have been excluded are marked in red. The conserved domains were created using MEME[21], and the HMM models used in HMMsearch are outlined in black boxes, with the model length displayed in parentheses following the model's name. (b) Neighbor join tree shows the distance relationships between four genes.

    • EnzymeHMM/CDD IDParameter setting
      b_idenb_qcovb_tcovh_cov
      4CLPF13193.6 (AMP-binding_C)
      PF00501(AMP-binding)
      407060−12090/90
      CHIPF02431.15 (Chalcone)357070−13090
      CHSPF02797.15 (Chal_sti_synt_C)
      PF00195.19 (Chal_sti_synt_N)
      508080−12090/90
      LARPF05368.13 (NmrA)306580−12090
      PALPF00221.19 (Lyase_aromatic)608080−12090
      PPO
      PF12142.8 (PPO1_DWL)
      PF12143.8 (PPO1_KFDV)
      357070−130
      90/90
      SCPL4*PF00450.22 (Peptidase_S10)708080−12090
      SCPL5*PF00450.22 (Peptidase_S10)708080−12090
      UGT84Acl10013408080−12090
      DFRPF01370.21 (Epimerase)608080−12090
      ANR558080−12090
      F3HPF03171.23 (2OG-FeII_Oxy)
      PF14226.9 (DIOX_N)
      608080−12090/90
      FLS608080−12090/90
      LDOX508080−12090/90
      ANS608080−12090/90
      F3'HPF00067.25 (p450)508080−12090
      F3'5'H608080−12090
      C4H708080−12090
      FNSII508080−12090
      * SCPL4, SCPL5 in SCPLIA[24]. 'b_iden' denotes the identity of BLASTP; 'b_qcov' represents the 'query coverage per HSP' of BLASTP, 'b_tcov' represents the 'target coverage per HSP' of BLAST, and 'h_covc' indicates the HMM coverage of HMMsearch.

      Table 1. 

      Files and parameters used for the annotation pipeline.