Updated on 2022/05/17

写真a

 
HAMADA, Michiaki
 
Affiliation
Faculty of Science and Engineering, School of Advanced Science and Engineering
Job title
Professor
Mail Address
メールアドレス
Profile

Dr. Michiaki Hamada was born in 1977. He graduated from the mathematical institute of Tohoku University in Mar. 2002. In Apr. 2002, he joined to Fuji Research Institute corporation (FRIC), whose current name is Mizuho Information & Research institute, Inc (MHRI), as a researcher, and conducted system development for science technology. During a researcher in FRIC, he started to do research on RNA bioinformatics under a support of “Functional RNA project” funded by NEDO. He received his PhD from Tokyo Institute of technology at 2009. Currently, Dr. Michiaki Hamada is an associate professor of Faculty of Science and Engineering at Waseda University, and is the principle investigator of Bioinformatics Laboratory. He is also a visiting researcher of AIST in Japan. He has been a board member of Japanese Society of Bioinformatics (JSBi) since 2014. His interests includes RNA informatics, sequence analysis, epigenetics, data-mining and machine learning. He aims to dvelope killer tools in the analysis of biological data.

Concurrent Post

  • Faculty of Science and Engineering   Graduate School of Advanced Science and Engineering

Research Institute

  • 2021
    -
    2022

    データ科学センター   兼任センター員

  • 2020
    -
    2022

    理工学術院総合研究所   兼任研究員

Education

  • 2005.10
    -
    2009.03

    Tokyo Institute of Technology   Interdisciplinary Science and Engineering   Intelligent Systems Science  

  • 2000.04
    -
    2002.03

    Tohoku University   Graduate School of Science   Department of Mathematics  

  • 1996.04
    -
    2000.03

    Tohoku University  

Degree

  • 2009.03   東京工業大学   博士(理学)

Research Experience

  • 2018.04
    -
    Now

    Waseda University   Faculty of Science and Engineering   Professor

  • 2017.04
    -
    Now

    Nippon Medical School

  • 2016.10
    -
    Now

    産業技術総合研究所   生体システムビッグデータ解析オープ ンイノベーションラボラトリ(CBBD-OIL)   班長

  • 2014.04
    -
    2018.03

    Waseda University Faculty of Science and Engineering   Associate Professor

  • 2010.10
    -
    2014.03

    The University of Tokyo

  • 2002.04
    -
    2010.09

    株式会社 富士総合研究所   研究員

▼display all

 

Research Areas

  • Life, health and medical informatics

  • Life, health and medical informatics

  • Intelligent informatics

Research Interests

  • RNA創薬

  • 人工知能

  • 確率モデル

  • RNA-タンパク質相互作用

  • RNA-RNA相互作用

  • interactome

  • long noncoding RNA (lncRNA)

  • epi-transcriptome

  • epi-genome

  • RNA aptamer

  • 配列情報解析

  • RNA

  • バイオインフォマティクス

▼display all

Papers

  • Clone decomposition based on mutation signatures provides novel insights into mutational processes.

    Taro Matsutani, Michiaki Hamada

    NAR genomics and bioinformatics   3 ( 4 ) lqab093  2021.12  [International journal]

     View Summary

    Intra-tumor heterogeneity is a phenomenon in which mutation profiles differ from cell to cell within the same tumor and is observed in almost all tumors. Understanding intra-tumor heterogeneity is essential from the clinical perspective. Numerous methods have been developed to predict this phenomenon based on variant allele frequency. Among the methods, CloneSig models the variant allele frequency and mutation signatures simultaneously and provides an accurate clone decomposition. However, this method has limitations in terms of clone number selection and modeling. We propose SigTracer, a novel hierarchical Bayesian approach for analyzing intra-tumor heterogeneity based on mutation signatures to tackle these issues. We show that SigTracer predicts more reasonable clone decompositions than the existing methods against artificial data that mimic cancer genomes. We applied SigTracer to whole-genome sequences of blood cancer samples. The results were consistent with past findings that single base substitutions caused by a specific signature (previously reported as SBS9) related to the activation-induced cytidine deaminase intensively lie within immunoglobulin-coding regions for chronic lymphocytic leukemia samples. Furthermore, we showed that this signature mutates regions responsible for cell-cell adhesion. Accurate assignments of mutations to signatures by SigTracer can provide novel insights into signature origins and mutational processes.

    DOI PubMed

  • Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization.

    Shitao Zhao, Michiaki Hamada

    BMC bioinformatics   22 ( 1 ) 554 - 554  2021.11  [International journal]

     View Summary

    BACKGROUND: Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. RESULTS: Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. CONCLUSIONS: Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.

    DOI PubMed

  • Impact of human gene annotations on RNA-seq differential expression analysis.

    Yu Hamaguchi, Chao Zeng, Michiaki Hamada

    BMC genomics   22 ( 1 ) 730 - 730  2021.10  [International journal]

     View Summary

    BACKGROUND: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. RESULTS: Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. CONCLUSIONS: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

    DOI PubMed

  • Binding patterns of RNA-binding proteins to repeat-derived RNA sequences reveal putative functional RNA elements.

    Masahiro Onoguchi, Chao Zeng, Ayako Matsumaru, Michiaki Hamada

    NAR genomics and bioinformatics   3 ( 3 ) lqab055  2021.09  [International journal]

     View Summary

    Recent reports have revealed that repeat-derived sequences embedded in introns or long noncoding RNAs (lncRNAs) are targets of RNA-binding proteins (RBPs) and contribute to biological processes such as RNA splicing or transcriptional regulation. These findings suggest that repeat-derived RNAs are important as scaffolds of RBPs and functional elements. However, the overall functional sequences of the repeat-derived RNAs are not fully understood. Here, we show the putative functional repeat-derived RNAs by analyzing the binding patterns of RBPs based on ENCODE eCLIP data. We mapped all eCLIP reads to repeat sequences and observed that 10.75 % and 7.04 % of reads on average were enriched (at least 2-fold over control) in the repeats in K562 and HepG2 cells, respectively. Using these data, we predicted functional RNA elements on the sense and antisense strands of long interspersed element 1 (LINE1) sequences. Furthermore, we found several new sets of RBPs on fragments derived from other transposable element (TE) families. Some of these fragments show specific and stable secondary structures and are found to be inserted into the introns of genes or lncRNAs. These results suggest that the repeat-derived RNA sequences are strong candidates for the functional RNA elements of endogenous noncoding RNAs.

    DOI PubMed

  • Umibato: estimation of time-varying microbial interaction using continuous-time regression hidden Markov model.

    Shion Hosoda, Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics (Oxford, England)   37 ( Suppl_1 ) i16-i24  2021.07  [International journal]

     View Summary

    MOTIVATION: Accumulating evidence has highlighted the importance of microbial interaction networks. Methods have been developed for estimating microbial interaction networks, of which the generalized Lotka-Volterra equation (gLVE)-based method can estimate a directed interaction network. The previous gLVE-based method for estimating microbial interaction networks did not consider time-varying interactions. RESULTS: In this study, we developed unsupervised learning-based microbial interaction inference method using Bayesian estimation (Umibato), a method for estimating time-varying microbial interactions. The Umibato algorithm comprises Gaussian process regression (GPR) and a new Bayesian probabilistic model, the continuous-time regression hidden Markov model (CTRHMM). Growth rates are estimated by GPR, and interaction networks are estimated by CTRHMM. CTRHMM can estimate time-varying interaction networks using interaction states, which are defined as hidden variables. Umibato outperformed the existing methods on synthetic datasets. In addition, it yielded reasonable estimations in experiments on a mouse gut microbiota dataset, thus providing novel insights into the relationship between consumed diets and the gut microbiota. AVAILABILITY AND IMPLEMENTATION: The C++ and python source codes of the Umibato software are available at https://github.com/shion-h/Umibato. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed

  • Possible roles for the hominoid-specific DSCR4 gene in human cells.

    Morteza M Saber, Marziyeh Karimiavargani, Takanori Uzawa, Nilmini Hettiarachchi, Michiaki Hamada, Yoshihiro Ito, Naruya Saitou

    Genes & genetic systems   96 ( 1 ) 1 - 11  2021.05  [Domestic journal]

     View Summary

    Down syndrome in humans is caused by trisomy of chromosome 21. DSCR4 (Down syndrome critical region 4) is a de novo-originated protein-coding gene present only in human chromosome 21 and its homologous chromosomes in apes. Despite being located in a medically critical genomic region and an abundance of evidence indicating its functionality, the roles of DSCR4 in human cells are unknown. We used a bioinformatic approach to infer the biological importance and cellular roles of this gene. Our analysis indicates that DSCR4 is likely involved in the regulation of interconnected biological pathways related to cell migration, coagulation and the immune system. We also showed that these predicted biological functions are consistent with tissue-specific expression of DSCR4 in migratory immune system leukocyte cells and neural crest cells (NCCs) that shape facial morphology in the human embryo. The immune system and NCCs are known to be affected in Down syndrome individuals, who suffer from DSCR4 misregulation, which further supports our findings. Providing evidence for the critical roles of DSCR4 in human cells, our findings establish the basis for further experimental investigations that will be necessary to confirm the roles of DSCR4 in the etiology of Down syndrome.

    DOI PubMed

  • PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.

    Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

    Bioinformatics (Oxford, England)   37 ( 5 ) 589 - 595  2021.05  [International journal]

     View Summary

    MOTIVATION: Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. RESULTS: To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. AVAILABILITY AND IMPLEMENTATION: The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed

  • Long Non-Coding RNA CRNDE Is Involved in Resistance to EGFR Tyrosine Kinase Inhibitor in EGFR-Mutant Lung Cancer via eIF4A3/MUC1/EGFR Signaling.

    Satoshi Takahashi, Rintaro Noro, Masahiro Seike, Chao Zeng, Masaru Matsumoto, Akiko Yoshikawa, Shinji Nakamichi, Teppei Sugano, Mariko Hirao, Kuniko Matsuda, Michiaki Hamada, Akihiko Gemma

    International journal of molecular sciences   22 ( 8 )  2021.04  [International journal]

     View Summary

    (1) Background: Acquired resistance to epidermal growth factor receptor-tyrosine kinase inhibitors (EGFR-TKIs) is an intractable problem for many clinical oncologists. The mechanisms of resistance to EGFR-TKIs are complex. Long non-coding RNAs (lncRNAs) may play an important role in cancer development and metastasis. However, the biological process between lncRNAs and drug resistance to EGFR-mutated lung cancer remains largely unknown. (2) Methods: Osimertinib- and afatinib-resistant EGFR-mutated lung cancer cells were established using a stepwise method. A microarray analysis of non-coding and coding RNAs was performed using parental and resistant EGFR-mutant non-small cell lung cancer (NSCLC) cells and evaluated by bioinformatics analysis through medical-industrial collaboration. (3) Results: Colorectal neoplasia differentially expressed (CRNDE) and DiGeorge syndrome critical region gene 5 (DGCR5) lncRNAs were highly expressed in EGFR-TKI-resistant cells by microarray analysis. RNA-protein binding analysis revealed eukaryotic translation initiation factor 4A3 (eIF4A3) bound in an overlapping manner to CRNDE and DGCR5. The CRNDE downregulates the expression of eIF4A3, mucin 1 (MUC1), and phospho-EGFR. Inhibition of CRNDE activated the eIF4A3/MUC1/EGFR signaling pathway and apoptotic activity, and restored sensitivity to EGFR-TKIs. (4) Conclusions: The results showed that CRNDE is associated with the development of resistance to EGFR-TKIs. CRNDE may be a novel therapeutic target to conquer EGFR-mutant NSCLC.

    DOI PubMed

  • Jonckheere-Terpstra-Kendall-based non-parametric analysis of temporal differential gene expression.

    Hitoshi Iuchi, Michiaki Hamada

    NAR genomics and bioinformatics   3 ( 1 ) lqab021  2021.03  [International journal]

     View Summary

    Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere-Terpstra-Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

    DOI PubMed

  • Association analysis of repetitive elements and R-loop formation across species.

    Chao Zeng, Masahiro Onoguchi, Michiaki Hamada

    Mobile DNA   12 ( 1 ) 3 - 3  2021.01  [International journal]

     View Summary

    BACKGROUND: Although recent studies have revealed the genome-wide distribution of R-loops, our understanding of R-loop formation is still limited. Genomes are known to have a large number of repetitive elements. Emerging evidence suggests that these sequences may play an important regulatory role. However, few studies have investigated the effect of repetitive elements on R-loop formation. RESULTS: We found different repetitive elements related to R-loop formation in various species. By controlling length and genomic distributions, we observed that satellite, long interspersed nuclear elements (LINEs), and DNA transposons were each specifically enriched for R-loops in humans, fruit flies, and Arabidopsis thaliana, respectively. R-loops also tended to arise in regions of low-complexity or simple repeats across species. We also found that the repetitive elements associated with R-loop formation differ according to developmental stage. For instance, LINEs and long terminal repeat retrotransposons (LTRs) are more likely to contain R-loops in embryos (fruit fly) and then turn out to be low-complexity and simple repeats in post-developmental S2 cells. CONCLUSIONS: Our results indicate that repetitive elements may have species-specific or development-specific regulatory effects on R-loop formation. This work advances our understanding of repetitive elements and R-loop biology.

    DOI PubMed

  • Representation learning applications in biological sequence analysis.

    Hitoshi Iuchi, Taro Matsutani, Keisuke Yamada, Natsuki Iwano, Shunsuke Sumi, Shion Hosoda, Shitao Zhao, Tsukasa Fukunaga, Michiaki Hamada

    Computational and structural biotechnology journal   19   3198 - 3208  2021  [International journal]

     View Summary

    Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

    DOI PubMed

  • Corrigendum: Possible roles for the hominoid-specific DSCR4 gene in human cells [Genes Genet. Syst. (2021) 96, p. 1-11].

    Morteza M Saber, Marziyeh Karimiavargani, Takanori Uzawa, Nilmini Hettiarachchi, Michiaki Hamada, Yoshihiro Ito, Naruya Saitou

    Genes & genetic systems   96 ( 2 ) 105 - 105  2021  [Domestic journal]

     View Summary

    Legends to Figures 4 and 5 (p. 7) should be exchanged. Below are the correct legends to Figure 4 and Figure 5. Fig. 4. Interconnection of DSCR4 overexpression-mediated perturbed pathways. KEGG analysis of DSCR4 overexpression-mediated DEGs shows enrichment for the tightly interconnected pathways of the coagulation cascade and the complement cascade (highlighted in red) and further confirm the connection of these cascades with cell adhesion, migration and proliferation (red circle). Fig. 5. Expression profile of DSCR4 across human cell lines and tissues. According to Roadmap Epigenomics Project data, DSCR4 and DSCR8, which share a bidirectional promoter, are highly expressed only in K562 cells, a type of leukemia cell. Analysis of transcriptome data provided by Prescott et al. (2015) showed that DSCR4 and DSCR8 also display high expression in human and chimpanzee neural crest cells, which are critical migratory cells involved in facial morphogenesis in the embryo. (1) Data from Prescott et al. (2015). (2) Samples also include esophagus, lung, spleen and fetal large intestine. (3) Samples also include brain germinal matrix, hippocampus, fetal small intestine, stomach, left ventricle, small intestine, sigmoid colon, HEPG2 cells and HMEC cells. The PDF file for DOI: https://doi.org/10.1266/ggs.20-00012 has been replaced with the corrected version as of June 17, 2021.

    DOI PubMed

  • Identification of m6A-Associated RNA Binding Proteins Using an Integrative Computational Framework.

    Yiqian Zhang, Michiaki Hamada

    Frontiers in genetics   12   625797 - 625797  2021  [International journal]

     View Summary

    N6-methyladenosine (m6A) is an abundant modification on mRNA that plays an important role in regulating essential RNA activities. Several wet lab studies have identified some RNA binding proteins (RBPs) that are related to m6A's regulation. The objective of this study was to identify potential m6A-associated RBPs using an integrative computational framework. The framework was composed of an enrichment analysis and a classification model. Utilizing RBPs' binding data, we analyzed reproducible m6A regions from independent studies using this framework. The enrichment analysis identified known m6A-associated RBPs including YTH domain-containing proteins; it also identified RBM3 as a potential m6A-associated RBP for mouse. Furthermore, a significant correlation for the identified m6A-associated RBPs is observed at the protein expression level rather than the gene expression level. On the other hand, a Random Forest classification model was built for the reproducible m6A regions using RBPs' binding data. The RBP-based predictor demonstrated not only competitive performance when compared with sequence-based predictions but also reflected m6A's action of repelling against RBPs, which suggested that our framework can infer interaction between m6A and m6A-associated RBPs beyond sequence level when utilizing RBPs' binding data. In conclusion, we designed an integrative computational framework for the identification of known and potential m6A-associated RBPs. We hope the analysis will provide more insights on the studies of m6A and RNA modifications.

    DOI PubMed

  • Detection and Characterization of Ribosome-Associated Long Noncoding RNAs.

    Chao Zeng, Michiaki Hamada

    Methods in molecular biology (Clifton, N.J.)   2254   179 - 194  2021  [International journal]

     View Summary

    Ribosome profiling shows potential for studying the function of long noncoding RNAs (lncRNAs). We introduce a bioinformatics pipeline for detecting ribosome-associated lncRNAs (ribo-lncRNAs) from ribosome profiling data. Further, we describe a machine-learning approach for the characterization of ribo-lncRNAs based on their sequence features. Scripts for ribo-lncRNA analysis can be accessed at ( https://ribolnc.hamadalab.com/ ).

    DOI PubMed

  • Parallelized Latent Dirichlet Allocation Provides a Novel Interpretability of Mutation Signatures in Cancer Genomes.

    Taro Matsutani, Michiaki Hamada

    Genes   11 ( 10 )  2020.09  [International journal]

     View Summary

    Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses.

    DOI PubMed

  • Free-Energy Calculation of Ribonucleic Inosines and Its Application to Nearest-Neighbor Parameters.

    Shun Sakuraba, Junichi Iwakiri, Michiaki Hamada, Tomoshi Kameda, Genichiro Tsuji, Yasuaki Kimura, Hiroshi Abe, Kiyoshi Asai

    Journal of chemical theory and computation   16 ( 9 ) 5923 - 5935  2020.09  [Refereed]  [International journal]

     View Summary

    Can current simulations quantitatively predict the stability of ribonucleic acids (RNAs)? In this research, we apply a free-energy perturbation simulation of RNAs containing inosine, a modified ribonucleic base, to the derivation of RNA nearest-neighbor parameters. A parameter set derived solely from 30 simulations was used to predict the free-energy difference of the RNA duplex with a mean unbiased error of 0.70 kcal/mol, which is a level of accuracy comparable to that obtained with parameters derived from 25 experiments. We further show that the error can be lowered to 0.60 kcal/mol by combining the simulation-derived free-energy differences with experimentally measured differences. This protocol can be used as a versatile method for deriving nearest-neighbor parameters of RNAs with various modified bases.

    DOI PubMed

  • RaptRanker: in silico RNA aptamer selection from HT-SELEX experiment based on local sequence and structure information.

    Ryoga Ishida, Tatsuo Adachi, Aya Yokota, Hidehito Yoshihara, Kazuteru Aoki, Yoshikazu Nakamura, Michiaki Hamada

    Nucleic acids research   48 ( 14 ) e82  2020.08  [International journal]

     View Summary

    Aptamers are short single-stranded RNA/DNA molecules that bind to specific target molecules. Aptamers with high binding-affinity and target specificity are identified using an in vitro procedure called high throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). However, the development of aptamer affinity reagents takes a considerable amount of time and is costly because HT-SELEX produces a large dataset of candidate sequences, some of which have insufficient binding-affinity. Here, we present RNA aptamer Ranker (RaptRanker), a novel in silico method for identifying high binding-affinity aptamers from HT-SELEX data by scoring and ranking. RaptRanker analyzes HT-SELEX data by evaluating the nucleotide sequence and secondary structure simultaneously, and by ranking according to scores reflecting local structure and sequence frequencies. To evaluate the performance of RaptRanker, we performed two new HT-SELEX experiments, and evaluated binding affinities of a part of sequences that include aptamers with low binding-affinity. In both datasets, the performance of RaptRanker was superior to Frequency, Enrichment and MPBind. We also confirmed that the consideration of secondary structures is effective in HT-SELEX data analysis, and that RaptRanker successfully predicted the essential subsequence motifs in each identified sequence.

    DOI PubMed

  • RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines.

    Chao Zeng, Michiaki Hamada

    Genes   11 ( 7 )  2020.07  [International journal]

     View Summary

    Alternative splicing, a ubiquitous phenomenon in eukaryotes, is a regulatory mechanism for the biological diversity of individual genes. Most studies have focused on the effects of alternative splicing for protein synthesis. However, the transcriptome-wide influence of alternative splicing on RNA subcellular localization has rarely been studied. By analyzing RNA-seq data obtained from subcellular fractions across 13 human cell lines, we identified 8720 switching genes between the cytoplasm and the nucleus. Consistent with previous reports, intron retention was observed to be enriched in the nuclear transcript variants. Interestingly, we found that short and structurally stable introns were positively correlated with nuclear localization. Motif analysis reveals that fourteen RNA-binding protein (RBPs) are prone to be preferentially bound with such introns. To our knowledge, this is the first transcriptome-wide study to analyze and evaluate the effect of alternative splicing on RNA subcellular localization. Our findings reveal that alternative splicing plays a promising role in regulating RNA subcellular localization.

    DOI PubMed

  • Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation.

    Shion Hosoda, Suguru Nishijima, Tsukasa Fukunaga, Masahira Hattori, Michiaki Hamada

    Microbiome   8 ( 1 ) 95 - 95  2020.06  [International journal]

     View Summary

    BACKGROUND: The human gut microbiome has been suggested to affect human health and thus has received considerable attention. To clarify the structure of the human gut microbiome, clustering methods are frequently applied to human gut taxonomic profiles. Enterotypes, i.e., clusters of individuals with similar microbiome composition, are well-studied and characterized. However, only a few detailed studies on assemblages, i.e., clusters of co-occurring bacterial taxa, have been conducted. Particularly, the relationship between the enterotype and assemblage is not well-understood. RESULTS: In this study, we detected gut microbiome assemblages using a latent Dirichlet allocation (LDA) method. We applied LDA to a large-scale human gut metagenome dataset and found that a 4-assemblage LDA model could represent relationships between enterotypes and assemblages with high interpretability. This model indicated that each individual tends to have several assemblages, three of which corresponded to the three classically recognized enterotypes. Conversely, the fourth assemblage corresponded to no enterotypes and emerged in all enterotypes. Interestingly, the dominant genera of this assemblage (Clostridium, Eubacterium, Faecalibacterium, Roseburia, Coprococcus, and Butyrivibrio) included butyrate-producing species such as Faecalibacterium prausnitzii. Indeed, the fourth assemblage significantly positively correlated with three butyrate-producing functions. CONCLUSIONS: We conducted an assemblage analysis on a large-scale human gut metagenome dataset using LDA. The present study revealed that there is an enterotype-independent assemblage. Video Abstract.

    DOI PubMed

  • MoAIMS: efficient software for detection of enriched regions of MeRIP-Seq.

    Yiqian Zhang, Michiaki Hamada

    BMC bioinformatics   21 ( 1 ) 103 - 103  2020.03  [International journal]

     View Summary

    BACKGROUND: Methylated RNA immunoprecipitation sequencing (MeRIP-Seq) is a popular sequencing method for studying RNA modifications and, in particular, for N6-methyladenosine (m6A), the most abundant RNA methylation modification found in various species. The detection of enriched regions is a main challenge of MeRIP-Seq analysis, however current tools either require a long time or do not fully utilize features of RNA sequencing such as strand information which could cause ambiguous calling. On the other hand, with more attention on the treatment experiments of MeRIP-Seq, biologists need intuitive evaluation on the treatment effect from comparison. Therefore, efficient and user-friendly software that can solve these tasks must be developed. RESULTS: We developed a software named "model-based analysis and inference of MeRIP-Seq (MoAIMS)" to detect enriched regions of MeRIP-Seq and infer signal proportion based on a mixture negative-binomial model. MoAIMS is designed for transcriptome immunoprecipitation sequencing experiments; therefore, it is compatible with different RNA sequencing protocols. MoAIMS offers excellent processing speed and competitive performance when compared with other tools. When MoAIMS is applied to studies of m6A, the detected enriched regions contain known biological features of m6A. Furthermore, signal proportion inferred from MoAIMS for m6A treatment datasets (perturbation of m6A methyltransferases) showed a decreasing trend that is consistent with experimental observations, suggesting that the signal proportion can be used as an intuitive indicator of treatment effect. CONCLUSIONS: MoAIMS is efficient and easy-to-use software implemented in R. MoAIMS can not only detect enriched regions of MeRIP-Seq efficiently but also provide intuitive evaluation on treatment effect for MeRIP-Seq treatment datasets.

    DOI PubMed

  • Nucleosome destabilization by nuclear non-coding RNAs.

    Risa Fujita, Tatsuro Yamamoto, Yasuhiro Arimura, Saori Fujiwara, Hiroaki Tachiwana, Yuichi Ichikawa, Yuka Sakata, Liying Yang, Reo Maruyama, Michiaki Hamada, Mitsuyoshi Nakao, Noriko Saitoh, Hitoshi Kurumizaka

    Communications biology   3 ( 1 ) 60 - 60  2020.02  [Refereed]  [International journal]

     View Summary

    In the nucleus, genomic DNA is wrapped around histone octamers to form nucleosomes. In principle, nucleosomes are substantial barriers to transcriptional activities. Nuclear non-coding RNAs (ncRNAs) are proposed to function in chromatin conformation modulation and transcriptional regulation. However, it remains unclear how ncRNAs affect the nucleosome structure. Eleanors are clusters of ncRNAs that accumulate around the estrogen receptor-α (ESR1) gene locus in long-term estrogen deprivation (LTED) breast cancer cells, and markedly enhance the transcription of the ESR1 gene. Here we detected nucleosome depletion around the transcription site of Eleanor2, the most highly expressed Eleanor in the LTED cells. We found that the purified Eleanor2 RNA fragment drastically destabilized the nucleosome in vitro. This activity was also exerted by other ncRNAs, but not by poly(U) RNA or DNA. The RNA-mediated nucleosome destabilization may be a common feature among natural nuclear RNAs, and may function in transcription regulation in chromatin.

    DOI PubMed

  • Targeting the TR4 nuclear receptor-mediated lncTASR/AXL signaling with tretinoin increases the sunitinib sensitivity to better suppress the RCC progression.

    Hangchuan Shi, Yin Sun, Miao He, Xiong Yang, Michiaki Hamada, Tsukasa Fukunaga, Xiaoping Zhang, Chawnshang Chang

    Oncogene   39 ( 3 ) 530 - 545  2020.01  [Refereed]  [International journal]

     View Summary

    Renal cell carcinoma (RCC) is one of the most lethal urological tumors. Using sunitinib to improve the survival has become the first-line therapy for metastatic RCC patients. However, the occurrence of sunitinib resistance in the clinical application has curtailed its efficacy. Here we found TR4 nuclear receptor might alter the sunitinib resistance to RCC via altering the TR4/lncTASR/AXL signaling. Mechanism dissection revealed that TR4 could modulate lncTASR (ENST00000600671.1) expression via transcriptional regulation, which might then increase AXL protein expression via enhancing the stability of AXL mRNA to increase the sunitinib resistance in RCC. Human clinical surveys also linked the expression of TR4, lncTASR, and AXL to the RCC survival, and results from multiple RCC cell lines revealed that targeting this newly identified TR4-mediated signaling with small molecules, including tretinoin, metformin, or TR4-shRNAs, all led to increase the sunitinib sensitivity to better suppress the RCC progression, and our preclinical study using the in vivo mouse model further proved tretinoin had a better synergistic effect to increase sunitinib sensitivity to suppress RCC progression. Future successful clinical trials may help in the development of a novel therapy to better suppress the RCC progression.

    DOI PubMed

  • Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference.

    Taro Matsutani, Yuki Ueno, Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics (Oxford, England)   35 ( 22 ) 4543 - 4552  2019.11  [Refereed]  [International journal]

     View Summary

    MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures-latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed

  • Stromal fibroblasts induce metastatic tumor cell clusters via epithelial-mesenchymal plasticity.

    Yuko Matsumura, Yasuhiko Ito, Yoshihiro Mezawa, Kaidiliayi Sulidan, Yataro Daigo, Toru Hiraga, Kaoru Mogushi, Nadila Wali, Hiromu Suzuki, Takumi Itoh, Yohei Miyagi, Tomoyuki Yokose, Satoru Shimizu, Atsushi Takano, Yasuhisa Terao, Harumi Saeki, Masayuki Ozawa, Masaaki Abe, Satoru Takeda, Ko Okumura, Sonoko Habu, Okio Hino, Kazuyoshi Takeda, Michiaki Hamada, Akira Orimo

    Life science alliance   2 ( 4 )  2019.08  [Refereed]  [International journal]

     View Summary

    Emerging evidence supports the hypothesis that multicellular tumor clusters invade and seed metastasis. However, whether tumor-associated stroma induces epithelial-mesenchymal plasticity in tumor cell clusters, to promote invasion and metastasis, remains unknown. We demonstrate herein that carcinoma-associated fibroblasts (CAFs) frequently present in tumor stroma drive the formation of tumor cell clusters composed of two distinct cancer cell populations, one in a highly epithelial (E-cadherinhiZEB1lo/neg: Ehi) state and another in a hybrid epithelial/mesenchymal (E-cadherinloZEB1hi: E/M) state. The Ehi cells highly express oncogenic cell-cell adhesion molecules, such as carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) and CEACAM6 that associate with E-cadherin, resulting in increased tumor cell cluster formation and metastatic seeding. The E/M cells also retain associations with Ehi cells, which follow the E/M cells leading to collective invasion. CAF-produced stromal cell-derived factor 1 and transforming growth factor-β confer the Ehi and E/M states as well as invasive and metastatic traits via Src activation in apposed human breast tumor cells. Taken together, these findings indicate that invasive and metastatic tumor cell clusters are induced by CAFs via epithelial-mesenchymal plasticity.

    DOI PubMed

  • Identification of RNA biomarkers for chemical safety screening in neural cells derived from mouse embryonic stem cells using RNA deep sequencing analysis.

    Hidenori Tani, Taro Matsutani, Hiroshi Aoki, Kaoru Nakamura, Yu Hamaguchi, Tetsuya Nakazato, Michiaki Hamada

    Biochemical and biophysical research communications   512 ( 4 ) 641 - 646  2019.05  [Refereed]  [International journal]

     View Summary

    Chemical safety screening requires the development of more efficient assays that do not involve testing in animals. In vitro cell-based assays are among the most appropriate alternatives to animal testing for screening of chemical toxicity. Most studies performed to date made use of mRNAs as biomarkers. Recent studies have however indicated the presence of many unannotated non-coding RNAs (ncRNAs) in the transcriptome that do appear to encode proteins. In the present study, we performed whole-transcriptome sequencing analysis (RNA-Seq) to identify novel RNA biomarkers, including ncRNAs, which showed marked responses to the toxicity of nine chemicals. Chemical safety screening was performed in cell-based assays using mouse embryonic stem cell (mESC)-derived neural cells. Marked responses in the expression of some ncRNAs to the chemical compounds were observed. The results of the present study suggested that ncRNAs may be useful in chemical safety screening as novel RNA biomarkers.

    DOI PubMed

  • LncRRIsearch: A Web Server for lncRNA-RNA Interaction Prediction Integrated With Tissue-Specific Expression and Subcellular Localization Data.

    Tsukasa Fukunaga, Junichi Iwakiri, Yukiteru Ono, Michiaki Hamada

    Frontiers in genetics   10   462 - 462  2019  [Refereed]  [International journal]

     View Summary

    Long non-coding RNAs (lncRNAs) play critical roles in various biological processes, but the function of the majority of lncRNAs is still unclear. One approach for estimating a function of a lncRNA is the identification of its interaction target because functions of lncRNAs are expressed through interaction with other biomolecules in quite a few cases. In this paper, we developed "LncRRIsearch," which is a web server for comprehensive prediction of human and mouse lncRNA-lncRNA and lncRNA-mRNA interaction. The prediction was conducted using RIblast, which is a fast and accurate RNA-RNA interaction prediction tool. Users can investigate interaction target RNAs of a particular lncRNA through a web interface. In addition, we integrated tissue-specific expression and subcellular localization data for the lncRNAs with the web server. These data enable users to examine tissue-specific or subcellular localized lncRNA interactions. LncRRIsearch is publicly accessible at http://rtools.cbrc.jp/LncRRIsearch/.

    DOI PubMed

  • DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning.

    Yiqian Zhang, Michiaki Hamada

    BMC bioinformatics   19 ( Suppl 19 ) 524 - 524  2018.12  [Refereed]  [International journal]

     View Summary

    BACKGROUND: N6-methyladensine (m6A) is a common and abundant RNA methylation modification found in various species. As a type of post-transcriptional methylation, m6A plays an important role in diverse RNA activities such as alternative splicing, an interplay with microRNAs and translation efficiency. Although existing tools can predict m6A at single-base resolution, it is still challenging to extract the biological information surrounding m6A sites. RESULTS: We implemented a deep learning framework, named DeepM6ASeq, to predict m6A-containing sequences and characterize surrounding biological features based on miCLIP-Seq data, which detects m6A sites at single-base resolution. DeepM6ASeq showed better performance as compared to other machine learning classifiers. Moreover, an independent test on m6A-Seq data, which identifies m6A-containing genomic regions, revealed that our model is competitive in predicting m6A-containing sequences. The learned motifs from DeepM6ASeq correspond to known m6A readers. Notably, DeepM6ASeq also identifies a newly recognized m6A reader: FMR1. Besides, we found that a saliency map in the deep learning model could be utilized to visualize locations of m6A sites. CONCULSION: We developed a deep-learning-based framework to predict and characterize m6A-containing sequences and hope to help investigators to gain more insights for m6A research. The source code is available at https://github.com/rreybeyb/DeepM6ASeq .

    DOI PubMed

  • Identifying sequence features that drive ribosomal association for lncRNA.

    Chao Zeng, Michiaki Hamada

    BMC genomics   19 ( Suppl 10 ) 906 - 906  2018.12  [Refereed]  [International journal]

     View Summary

    BACKGROUND: With the increasing number of annotated long noncoding RNAs (lncRNAs) from the genome, researchers are continually updating their understanding of lncRNAs. Recently, thousands of lncRNAs have been reported to be associated with ribosomes in mammals. However, their biological functions or mechanisms are still unclear. RESULTS: In this study, we tried to investigate the sequence features involved in the ribosomal association of lncRNA. We have extracted ninety-nine sequence features corresponding to different biological mechanisms (i.e., RNA splicing, putative ORF, k-mer frequency, RNA modification, RNA secondary structure, and repeat element). An [Formula: see text]-regularized logistic regression model was applied to screen these features. Finally, we obtained fifteen and nine important features for the ribosomal association of human and mouse lncRNAs, respectively. CONCLUSION: To our knowledge, this is the first study to characterize ribosome-associated lncRNAs and ribosome-free lncRNAs from the perspective of sequence features. These sequence features that were identified in this study may shed light on the biological mechanism of the ribosomal association and provide important clues for functional analysis of lncRNAs.

    DOI PubMed

  • Nearest-neighbor parameter for inosine-cytosine pairs through a combined experimental and computational approach

    Shun Sakuraba, Junichi Iwakiri, Michiaki Hamada, Tomoshi Kameda, Genichiro Tsuji, Yasuaki Kimura, Hiroshi Abe, Kiyoshi Asai

       2018.10  [Refereed]

     View Summary

    <title>Abstract</title>In RNA secondary structure prediction, nearest-neighbor parameters are used to determine the stability of a given structure. We derived the nearest-neighbor parameters for RNAs containing inosine-cytosine pairs. For parameter derivation, we developed a method that combines UV adsorption measurement experiments with free-energy calculations using molecular dynamics simulations. The method provides fast drop-in parameters for modified bases. Derived parameters were compared and found to be consistent with existing parameters for canonical RNAs. A duplex with an internal inosine-cytosine pair is 0.9 kcal/mol more unstable than the same duplex with an internal guanine-cytosine pair, and is as stable as the one with an internal adenine-uracil pair (only 0.1 kcal/mol more stable) on average.

    DOI

  • A Novel Method for Assessing the Statistical Significance of RNA-RNA Interactions Between Two Long RNAs.

    Tsukasa Fukunaga, Michiaki Hamada

    Journal of computational biology : a journal of computational molecular cell biology   25 ( 9 ) 976 - 986  2018.09  [Refereed]  [International journal]

     View Summary

    RNA-RNA interactions are key mechanisms through which noncoding RNA (ncRNA) regions exert biological functions. Computational prediction of RNA-RNA interactions is an essential method for detecting novel RNA-RNA interactions because their comprehensive detection by biological experimentation is still quite difficult. Many RNA-RNA interaction prediction tools have been developed, but they tend to produce many false positives. Accordingly, assessment of the statistical significance of computationally predicted interactions is an important task. However, there is no method to evaluate the statistical significance of RNA-RNA interactions that is applicable to interactions between two long RNA sequences. We developed a method to calculate the p-value for the minimal interaction energy between two long RNA sequences. The developed method depends on the fact that minimum interaction energies of RNA-RNA interactions between long RNAs follow a Gumbel distribution when repeat sequences in RNAs are masked. To show the usefulness of the developed method, we applied it to whole human 5'-untranslated region (UTR) and 3'-UTR sequences to detect novel 5'-UTR-3'-UTR interactions. We thus identified two significant 5'-UTR-3'-UTR interactions. Specifically, the human small proline-rich repeat protein 3 shows conserved 5'-UTR-3'-UTR interactions with some nucleotide variations preserving base pairings among primates. Our developed method enables us to detect statistically significant RNA-RNA interactions between long RNAs such as long ncRNAs. Statistical significance estimates help in identification of interactions for experimental validation and provide novel insights into the function of ncRNA regions.

    DOI PubMed

  • Computational approaches for alternative and transient secondary structures of ribonucleic acids.

    Tsukasa Fukunaga, Michiaki Hamada

    Briefings in functional genomics   18 ( 3 ) 182 - 191  2018.06  [Refereed]  [International journal]

     View Summary

    Transient and alternative structures of ribonucleic acids (RNAs) play essential roles in various regulatory processes, such as translation regulation in living cells. Because experimental analyses for RNA structures are difficult and time-consuming, computational approaches based on RNA secondary structures are promising. In this article, we review computational methods for detecting and analyzing transient/alternative secondary structures of RNAs, including static approaches based on probabilistic distributions of RNA secondary structures and dynamic approaches such as kinetic folding and folding pathway predictions.

    DOI PubMed

  • Identification and analysis of ribosome-associated lncRNAs using ribosome profiling data.

    Chao Zeng, Tsukasa Fukunaga, Michiaki Hamada

    BMC genomics   19 ( 1 ) 414 - 414  2018.05  [Refereed]  [International journal]

     View Summary

    BACKGROUND: Although the number of discovered long non-coding RNAs (lncRNAs) has increased dramatically, their biological roles have not been established. Many recent studies have used ribosome profiling data to assess the protein-coding capacity of lncRNAs. However, very little work has been done to identify ribosome-associated lncRNAs, here defined as lncRNAs interacting with ribosomes related to protein synthesis as well as other unclear biological functions. RESULTS: On average, 39.17% of expressed lncRNAs were observed to interact with ribosomes in human and 48.16% in mouse. We developed the ribosomal association index (RAI), which quantifies the evidence for ribosomal associability of lncRNAs over various tissues and cell types, to catalog 691 and 409 lncRNAs that are robustly associated with ribosomes in human and mouse, respectively. Moreover, we identified 78 and 42 lncRNAs with a high probability of coding peptides in human and mouse, respectively. Compared with ribosome-free lncRNAs, ribosome-associated lncRNAs were observed to be more likely to be located in the cytoplasm and more sensitive to nonsense-mediated decay. CONCLUSION: Our results suggest that RAI can be used as an integrative and evidence-based tool for distinguishing between ribosome-associated and free lncRNAs, providing a valuable resource for the study of lncRNA functions.

    DOI PubMed

  • Estimating energy parameters for RNA secondary structure predictions using both experimental and computational data.

    Nishida S, Sakuraba S, Asai K, Hamada M

    IEEE/ACM transactions on computational biology and bioinformatics   16 ( 5 ) 1645 - 1655  2018.03  [Refereed]

    DOI PubMed

  • Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm.

    Taikai Takeda, Michiaki Hamada, John Hancock

    Bioinformatics (Oxford, England)   34 ( 4 ) 576 - 584  2018.02  [Refereed]  [International journal]

     View Summary

    Motivation: Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results: We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation: The software is available at https://github.com/bigsea-t/fab-phmm. Contact: mhamada@waseda.jp. Supplementary information: Supplementary data are available at Bioinformatics online.

    DOI PubMed

  • In silico approaches to RNA aptamer design.

    Michiaki Hamada

    Biochimie   145   8 - 14  2018.02  [Refereed]  [International journal]

     View Summary

    RNA aptamers are ribonucleic acids that bind to specific target molecules. An RNA aptamer for a disease-related protein has great potential for development into a new drug. However, huge time and cost investments are required to develop an RNA aptamer into a pharmaceutical. Recently, SELEX combined with high-throughput sequencers (i.e., HT-SELEX) has been widely used to select candidate RNA aptamers that bind to a target protein with high affinity and specificity. After candidate selection, further optimizations such as shortening and modifying candidate sequences are performed. In these steps, in silico approaches are expected to reduce the time and cost associated with aptamer drug development. In this article, we review existing in silico approaches to RNA aptamer development, including a method for ranking the candidates of RNA aptamers from HT-SELEX data, clustering a huge number of aptamer sequences, and finding motifs amidst a set of significant RNA aptamers. It is expected that further studies in addition to these methods will be utilized for in silico RNA aptamer design, permitting a minimal number of experiments to be performed through the utilization of sophisticated computational methods.

    DOI PubMed

  • Identification of Transposable Elements Contributing to Tissue-Specific Expression of Long Non-Coding RNAs.

    Takafumi Chishima, Junichi Iwakiri, Michiaki Hamada

    Genes   9 ( 1 )  2018.01  [Refereed]  [International journal]

     View Summary

    It has been recently suggested that transposable elements (TEs) are re-used as functional elements of long non-coding RNAs (lncRNAs). This is supported by some examples such as the human endogenous retrovirus subfamily H (HERVH) elements contained within lncRNAs and expressed specifically in human embryonic stem cells (hESCs), as required to maintain hESC identity. There are at least two unanswered questions about all lncRNAs. How many TEs are re-used within lncRNAs? Are there any other TEs that affect tissue specificity of lncRNA expression? To answer these questions, we comprehensively identify TEs that are significantly related to tissue-specific expression levels of lncRNAs. We downloaded lncRNA expression data corresponding to normal human tissue from the Expression Atlas and transformed the data into tissue specificity estimates. Then, Fisher's exact tests were performed to verify whether the presence or absence of TE-derived sequences influences the tissue specificity of lncRNA expression. Many TE-tissue pairs associated with tissue-specific expression of lncRNAs were detected, indicating that multiple TE families can be re-used as functional domains or regulatory sequences of lncRNAs. In particular, we found that the antisense promoter region of L1PA2, a LINE-1 subfamily, appears to act as a promoter for lncRNAs with placenta-specific expression.

    DOI PubMed

  • RIblast: an ultrafast RNA-RNA interaction prediction system based on a seed-and-extension approach.

    Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics (Oxford, England)   33 ( 17 ) 2666 - 2674  2017.09  [Refereed]  [International journal]

     View Summary

    Motivation: LncRNAs play important roles in various biological processes. Although more than 58 000 human lncRNA genes have been discovered, most known lncRNAs are still poorly characterized. One approach to understanding the functions of lncRNAs is the detection of the interacting RNA target of each lncRNA. Because experimental detections of comprehensive lncRNA-RNA interactions are difficult, computational prediction of lncRNA-RNA interactions is an indispensable technique. However, the high computational costs of existing RNA-RNA interaction prediction tools prevent their application to large-scale lncRNA datasets. Results: Here, we present 'RIblast', an ultrafast RNA-RNA interaction prediction method based on the seed-and-extension approach. RIblast discovers seed regions using suffix arrays and subsequently extends seed regions based on an RNA secondary structure energy model. Computational experiments indicate that RIblast achieves a level of prediction accuracy similar to those of existing programs, but at speeds over 64 times faster than existing programs. Availability and implementation: The source code of RIblast is freely available at https://github.com/fukunagatsu/RIblast . Contact: t.fukunaga@kurenai.waseda.jp or mhamada@waseda.jp. Supplementary information: Supplementary data are available at Bioinformatics online.

    DOI PubMed

  • Computational prediction of lncRNA-mRNA interactionsby integrating tissue specificity in human transcriptome.

    Junichi Iwakiri, Goro Terai, Michiaki Hamada

    Biology direct   12 ( 1 ) 15 - 15  2017.06  [Refereed]  [International journal]

     View Summary

    Long noncoding RNAs (lncRNAs) play a key role in normal tissue differentiation and cancer development through their tissue-specific expression in the human transcriptome. Recent investigations of macromolecular interactions have shown that tissue-specific lncRNAs form base-pairing interactions with various mRNAs associated with tissue-differentiation, suggesting that tissue specificity is an important factor controlling human lncRNA-mRNA interactions.Here, we report investigations of the tissue specificities of lncRNAs and mRNAs by using RNA-seq data across various human tissues as well as computational predictions of tissue-specific lncRNA-mRNA interactions inferred by integrating the tissue specificity of lncRNAs and mRNAs into our comprehensive prediction of human lncRNA-RNA interactions. Our predicted lncRNA-mRNA interactions were evaluated by comparisons with experimentally validated lncRNA-mRNA interactions (between the TINCR lncRNA and mRNAs), showing the improvement of prediction accuracy over previous prediction methods that did not account for tissue specificities of lncRNAs and mRNAs. In addition, our predictions suggest that the potential functions of TINCR lncRNA not only for epidermal differentiation but also for esophageal development through lncRNA-mRNA interactions. REVIEWERS: This article was reviewed by Dr. Weixiong Zhang and Dr. Bojan Zagrovic.

    DOI PubMed

  • AMAP: A pipeline for whole-genome mutation detection in Arabidopsis thaliana.

    Kotaro Ishii, Yusuke Kazama, Tomonari Hirano, Michiaki Hamada, Yukiteru Ono, Mieko Yamada, Tomoko Abe

    Genes & genetic systems   91 ( 4 ) 229 - 233  2017.03  [Refereed]  [Domestic journal]

     View Summary

    Detection of mutations at the whole-genome level is now possible by the use of high-throughput sequencing. However, determining mutations is a time-consuming process due to the number of false positives provided by mutation-detecting programs. AMAP (automated mutation analysis pipeline) was developed to overcome this issue. AMAP integrates a set of well-validated programs for mapping (BWA), removal of potential PCR duplicates (Picard), realignment (GATK) and detection of mutations (SAMtools, GATK, Pindel, BreakDancer and CNVnator). Thus, all types of mutations such as base substitution, deletion, insertion, translocation and chromosomal rearrangement can be detected by AMAP. In addition, AMAP automatically distinguishes false positives by comparing lists of candidate mutations in sequenced mutants. We tested AMAP by inputting already analyzed read data derived from three individual Arabidopsis thaliana mutants and confirmed that all true mutations were included in the list of candidate mutations. The result showed that the number of false positives was reduced to 12% of that obtained in a previous analysis that lacked a process of reducing false positives. Thus, AMAP will accelerate not only the analysis of mutation induction by individual mutagens but also the process of forward genetics.

    DOI PubMed

  • Training alignment parameters for arbitrary sequencers with LAST-TRAIN.

    Michiaki Hamada, Yukiteru Ono, Kiyoshi Asai, Martin C Frith

    Bioinformatics (Oxford, England)   33 ( 6 ) 926 - 928  2017.03  [Refereed]  [International journal]

     View Summary

    Summary: LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation: the source code is freely available at http://last.cbrc.jp/. Contact: mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary information: Supplementary data are available at Bioinformatics online.

    DOI PubMed

  • Improved Accuracy in RNA-Protein Rigid Body Docking by Incorporating Force Field for Molecular Dynamics Simulation into the Scoring Function.

    Junichi Iwakiri, Michiaki Hamada, Kiyoshi Asai, Tomoshi Kameda

    Journal of chemical theory and computation   12 ( 9 ) 4688 - 97  2016.09  [Refereed]  [International journal]

     View Summary

    RNA-protein interactions play fundamental roles in many biological processes. To understand these interactions, it is necessary to know the three-dimensional structures of RNA-protein complexes. However, determining the tertiary structure of these complexes is often difficult, suggesting that an accurate rigid body docking for RNA-protein complexes is needed. In general, the rigid body docking process is divided into two steps: generating candidate structures from the individual RNA and protein structures and then narrowing down the candidates. In this study, we focus on the former problem to improve the prediction accuracy in RNA-protein docking. Our method is based on the integration of physicochemical information about RNA into ZDOCK, which is known as one of the most successful computer programs for protein-protein docking. Because recent studies showed the current force field for molecular dynamics simulation of protein and nucleic acids is quite accurate, we modeled the physicochemical information about RNA by force fields such as AMBER and CHARMM. A comprehensive benchmark of RNA-protein docking, using three recently developed data sets, reveals the remarkable prediction accuracy of the proposed method compared with existing programs for docking: the highest success rate is 34.7% for the predicted structure of the RNA-protein complex with the best score and 79.2% for 3,600 predicted ones. Three full atomistic force fields for RNA (AMBER94, AMBER99, and CHARMM22) produced almost the same accurate result, which showed current force fields for nucleic acids are quite accurate. In addition, we found that the electrostatic interaction and the representation of shape complementary between protein and RNA plays the important roles for accurate prediction of the native structures of RNA-protein complexes.

    DOI PubMed J-GLOBAL

  • 長鎖ノンコーディングRNAのためのバイオインフォマティクス

    岩切淳一, 浜田道昭

    生物物理   56 ( 4 ) 217 - 220  2016.08  [Refereed]

    DOI

  • Rtools: a web server for various secondary structural analyses on single RNA sequences

    Michiaki Hamada, Yukiteru Ono, Hisanori Kiryu, Kengo Sato, Yuki Kato, Tsukasa Fukunaga, Ryota Mori, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   44 ( W1 ) W302 - W307  2016.07  [Refereed]

     View Summary

    The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.

    DOI

  • Rtools: a web server for various secondary structural analyses on single RNA sequences

    Michiaki Hamada, Yukiteru Ono, Hisanori Kiryu, Kengo Sato, Yuki Kato, Tsukasa Fukunaga, Ryota Mori, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   44 ( W1 ) W302 - W307  2016.07  [Refereed]

     View Summary

    The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.

    DOI PubMed

  • Comprehensive prediction of lncRNA-RNA interactions in human transcriptome

    Goro Terai, Junichi Iwakiri, Tomoshi Kameda, Michiaki Hamada, Kiyoshi Asai

    BMC GENOMICS   17 ( S-1 ) 12  2016  [Refereed]

     View Summary

    Motivation: Recent studies have revealed that large numbers of non-coding RNAs are transcribed in humans, but only a few of them have been identified with their functions. Identification of the interaction target RNAs of the non-coding RNAs is an important step in predicting their functions. The current experimental methods to identify RNA-RNA interactions, however, are not fast enough to apply to a whole human transcriptome. Therefore, computational predictions of RNA-RNA interactions are desirable, but this is a challenging task due to the huge computational costs involved.
    Results: Here, we report comprehensive predictions of the interaction targets of lncRNAs in a whole human transcriptome for the first time. To achieve this, we developed an integrated pipeline for predicting RNA-RNA interactions on the K computer, which is one of the fastest super-computers in the world. Comparisons with experimentally-validated lncRNA-RNA interactions support the quality of the predictions. Additionally, we have developed a database that catalogs the predicted lncRNA-RNA interactions to provide fundamental information about the targets of lncRNAs.

    DOI

  • Bioinformatics tools for lncRNA research.

    Iwakiri J, Hamada M, Asai K

    Biochimica et biophysica acta   1859 ( 1 ) 23 - 30  2016.01  [Refereed]

    DOI PubMed

  • Comprehensive prediction of lncRNA-RNA interactions in human transcriptome

    Goro Terai, Junichi Iwakiri, Tomoshi Kameda, Michiaki Hamada, Kiyoshi Asai

    BMC GENOMICS   17 ( 1 ) 12  2016  [Refereed]

     View Summary

    Motivation: Recent studies have revealed that large numbers of non-coding RNAs are transcribed in humans, but only a few of them have been identified with their functions. Identification of the interaction target RNAs of the non-coding RNAs is an important step in predicting their functions. The current experimental methods to identify RNA-RNA interactions, however, are not fast enough to apply to a whole human transcriptome. Therefore, computational predictions of RNA-RNA interactions are desirable, but this is a challenging task due to the huge computational costs involved.
    Results: Here, we report comprehensive predictions of the interaction targets of lncRNAs in a whole human transcriptome for the first time. To achieve this, we developed an integrated pipeline for predicting RNA-RNA interactions on the K computer, which is one of the fastest super-computers in the world. Comparisons with experimentally-validated lncRNA-RNA interactions support the quality of the predictions. Additionally, we have developed a database that catalogs the predicted lncRNA-RNA interactions to provide fundamental information about the targets of lncRNAs.

    DOI PubMed J-GLOBAL

  • Bioinformatics tools for lncRNA research

    Junichi Iwakiri, Michiaki Hamada, Kiyoshi Asai

    BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS   1859 ( 1 ) 23 - 30  2016.01  [Refereed]

     View Summary

    Current experimental methods to identify the functions of a large number of the candidates of long non-coding RNAs (lncRNAs) are limited in their throughput. Therefore, it is essential to know which tools are effective for understanding lncRNAs so that reasonable speed and accuracy can be achieved. In this paper, we review the currently available bioinformatics tools and databases that are useful for finding non-coding RNAs and analyzing their structures, conservation, interactions, co-expressions and localization. This article is part of a Special Issue entitled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa. (C) 2015 Elsevier B.V. All fights reserved.

    DOI PubMed J-GLOBAL

  • Privacy-preserving search for chemical compound databases

    Kana Shimizu, Koji Nuida, Hiromi Arai, Shigeo Mitsunari, Nuttapong Attrapadung, Michiaki Hamada, Koji Tsuda, Takatsugu Hirokawa, Jun Sakuma, Goichiro Hanaoka, Kiyoshi Asai

    BMC BIOINFORMATICS   16 ( 18 ) S6  2015.12  [Refereed]

     View Summary

    Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources.
    Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation.
    Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

    DOI PubMed J-GLOBAL

  • A semi-supervised learning approach for RNA secondary structure prediction

    Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    COMPUTATIONAL BIOLOGY AND CHEMISTRY   57   72 - 79  2015.08  [Refereed]

     View Summary

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. (C) 2015 Elsevier Ltd. All rights reserved.

    DOI

  • Learning chromatin states with factorized information criteria

    Michiaki Hamada, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai

    BIOINFORMATICS   31 ( 15 ) 2426 - 2433  2015.08  [Refereed]

     View Summary

    Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized.
    Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

    DOI

  • A semi-supervised learning approach for RNA secondary structure prediction

    Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    COMPUTATIONAL BIOLOGY AND CHEMISTRY   57   72 - 79  2015.08  [Refereed]

     View Summary

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. (C) 2015 Elsevier Ltd. All rights reserved.

    DOI PubMed J-GLOBAL

  • Learning chromatin states with factorized information criteria

    Michiaki Hamada, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai

    BIOINFORMATICS   31 ( 15 ) 2426 - 2433  2015.08  [Refereed]

     View Summary

    Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized.
    Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

    DOI PubMed J-GLOBAL

  • RNA secondary structure prediction from multi-aligned sequences

    Michiaki Hamada

    RNA Bioinformatics   1269   17 - 38  2015.01  [Refereed]

     View Summary

    It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics
    the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.

    DOI PubMed J-GLOBAL

  • Efficient calculation of exact probability distributions of integer features on RNA secondary structures

    Ryota Mori, Michiaki Hamada, Kiyoshi Asai

    BMC GENOMICS   15   S6  2014.12  [Refereed]

     View Summary

    Background: Although the needs for analyses of secondary structures of RNAs are increasing, prediction of the secondary structures of RNAs are not always reliable. Because an RNA may have a complicated energy landscape, comprehensive representations of the whole ensemble of the secondary structures, such as the probability distributions of various features of RNA secondary structures are required.
    Results: A general method to efficiently compute the distribution of any integer scalar/vector function on the secondary structure is proposed. We also show two concrete algorithms, for Hamming distance from a reference structure and for 5' - 3' distance, which can be constructed by following our general method. These practical applications of this method show the effectiveness of the proposed method.
    Conclusions: The proposed method provides a clear and comprehensive procedure to construct algorithms for distributions of various integer features. In addition, distributions of integer vectors, that is a combination of different integer scores, can be also described by applying our 2D expanding technique.

    DOI PubMed J-GLOBAL

  • Reference-free prediction of rearrangement breakpoint reads

    Edward Wijaya, Kana Shimizu, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   30 ( 18 ) 2559 - 2567  2014.09  [Refereed]

     View Summary

    Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.
    Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100x, it finds similar to 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

    DOI PubMed J-GLOBAL

  • Reference-free prediction of rearrangement breakpoint reads

    Edward Wijaya, Kana Shimizu, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   30 ( 18 ) 2559 - 2567  2014.09  [Refereed]

     View Summary

    Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.
    Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100x, it finds similar to 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

    DOI

  • Fighting against uncertainty: an essential issue in bioinformatics

    Michiaki Hamada

    BRIEFINGS IN BIOINFORMATICS   15 ( 5 ) 748 - 767  2014.09  [Refereed]

     View Summary

    Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.

    DOI

  • RNA structural alignments, part II: non-Sankoff approaches for structural alignments.

    Asai K, Hamada M

    Methods in molecular biology (Clifton, N.J.)   1097   291 - 301  2014  [Refereed]

    DOI PubMed J-GLOBAL

  • Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions

    Junichi Iwakiri, Tomoshi Kameda, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 20 ) 2524 - 2528  2013.10  [Refereed]

     View Summary

    Motivation: Understanding the details of protein-RNA interactions is important to reveal the functions of both the RNAs and the proteins. In these interactions, the secondary structures of the RNAs play an important role. Because RNA secondary structures in protein-RNA complexes are variable, considering the ensemble of RNA secondary structures is a useful approach. In particular, recent studies have supported the idea that, in the analysis of RNA secondary structures, the base-pairing probabilities (BPPs) of RNAs (i.e. the probabilities of forming a base pair in the ensemble of RNA secondary structures) provide richer and more robust information about the structures than a single RNA secondary structure, for example, the minimum free energy structure or a snapshot of structures in the Protein Data Bank. However, there has been no investigation of the BPPs in protein-RNA interactions.
    Results: In this study, we analyzed BPPs of RNA molecules involved in known protein-RNA complexes in the Protein Data Bank. Our analysis suggests that, in the tertiary structures, the BPPs (which are computed using only sequence information) for unpaired nucleotides with intermolecular hydrogen bonds (hbonds) to amino acids were significantly lower than those for unpaired nucleotides without hbonds. On the other hand, no difference was found between the BPPs for paired nucleotides with and without intermolecular hbonds. Those findings were commonly supported by three probabilistic models, which provide the ensemble of RNA secondary structures, including the McCaskill model based on Turner's free energy of secondary structures.

    DOI

  • Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions

    Junichi Iwakiri, Tomoshi Kameda, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 20 ) 2524 - 2528  2013.10  [Refereed]

     View Summary

    Motivation: Understanding the details of protein-RNA interactions is important to reveal the functions of both the RNAs and the proteins. In these interactions, the secondary structures of the RNAs play an important role. Because RNA secondary structures in protein-RNA complexes are variable, considering the ensemble of RNA secondary structures is a useful approach. In particular, recent studies have supported the idea that, in the analysis of RNA secondary structures, the base-pairing probabilities (BPPs) of RNAs (i.e. the probabilities of forming a base pair in the ensemble of RNA secondary structures) provide richer and more robust information about the structures than a single RNA secondary structure, for example, the minimum free energy structure or a snapshot of structures in the Protein Data Bank. However, there has been no investigation of the BPPs in protein-RNA interactions.
    Results: In this study, we analyzed BPPs of RNA molecules involved in known protein-RNA complexes in the Protein Data Bank. Our analysis suggests that, in the tertiary structures, the BPPs (which are computed using only sequence information) for unpaired nucleotides with intermolecular hydrogen bonds (hbonds) to amino acids were significantly lower than those for unpaired nucleotides without hbonds. On the other hand, no difference was found between the BPPs for paired nucleotides with and without intermolecular hbonds. Those findings were commonly supported by three probabilistic models, which provide the ensemble of RNA secondary structures, including the McCaskill model based on Turner's free energy of secondary structures.

    DOI PubMed J-GLOBAL

  • Fighting against uncertainty: An essential issue in bioinformatics

    Michiaki Hamada

    Briefings in Bioinformatics   15 ( 5 ) 748 - 767  2013.05  [Refereed]

     View Summary

    Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.

    DOI PubMed J-GLOBAL

  • CentroidAlign-Web: A fast and accurate multiple aligner for long non-coding RNAs

    Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    International Journal of Molecular Sciences   14 ( 3 ) 6144 - 6156  2013.03  [Refereed]

     View Summary

    Due to the recent discovery of non-coding RNAs (ncRNAs), multiple sequence alignment (MSA) of those long RNA sequences is becoming increasingly important for classifying and determining the functional motifs in RNAs. However, not only primary (nucleotide) sequences, but also secondary structures of ncRNAs are closely related to their function and are conserved evolutionarily. Hence, information about secondary structures should be considered in the sequence alignment of ncRNAs. Yet, in general, a huge computational time is required in order to compute MSAs, taking secondary structure information into account. In this paper, we describe a fast and accurate web server, called CentroidAlign-Web, which can handle long RNA sequences. The web server also appropriately incorporates information about known secondary structures into MSAs. Computational experiments indicate that our web server is fast and accurate enough to handle long RNA sequences. CentroidAlign-Web is freely available from http://centroidalign.ncrna.org/. © 2013 by the authors
    licensee MDPI, Basel, Switzerland.

    DOI PubMed J-GLOBAL

  • Generalized Centroid Estimators in Bioinformatics

    Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

    CoRR   abs/1305.4339  2013  [Refereed]

  • PBSIM: PacBio reads simulator-toward accurate genome assembly

    Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 1 ) 119 - 121  2013.01  [Refereed]

     View Summary

    Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.
    Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

    DOI

  • PBSIM: PacBio reads simulator-toward accurate genome assembly

    Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 1 ) 119 - 121  2013.01  [Refereed]

     View Summary

    Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.
    Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

    DOI PubMed J-GLOBAL

  • Direct Updating of an RNA Base-Pairing Probability Matrix with Marginal Probability Constraints

    Michiaki Hamada

    JOURNAL OF COMPUTATIONAL BIOLOGY   19 ( 12 ) 1265 - 1276  2012.12  [Refereed]

     View Summary

    A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented.

    DOI

  • Direct updating of an RNA base-pairing probability matrix with marginal probability constraints

    Michiaki Hamada

    Journal of Computational Biology   19 ( 12 ) 1265 - 1276  2012.12  [Refereed]

     View Summary

    A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented. © 2012 Mary Ann Liebert, Inc.

    DOI PubMed J-GLOBAL

  • Shape-based alignment of genomic landscapes in multi-scale resolution

    Hiroki Ashida, Kiyoshi Asai, Michiaki Hamada

    NUCLEIC ACIDS RESEARCH   40 ( 14 ) 6435 - 6448  2012.08  [Refereed]

     View Summary

    Due to dramatic advances in DNA technology, quantitative measures of annotation data can now be obtained in continuous coordinates across the entire genome, allowing various heterogeneous 'genomic landscapes' to emerge. Although much effort has been devoted to comparing DNA sequences, not much attention has been given to comparing these large quantities of data comprehensively. In this article, we introduce a method for rapidly detecting local regions that show high correlations between genomic landscapes. We overcame the size problem for genome-wide data by converting the data into series of symbols and then carrying out sequence alignment. We also decomposed the oscillation of the landscape data into different frequency bands before analysis, since the real genomic landscape is a mixture of embedded and confounded biological processes working at different scales in the cell nucleus. To verify the usefulness and generality of our method, we applied our approach to well investigated landscapes from the human genome, including several histone modifications. Furthermore, by applying our method to over 20 genomic landscapes in human and 12 in mouse, we found that DNA replication timing and the density of Alu insertions are highly correlated genome-wide in both species, even though the Alu elements have amplified independently in the two genomes. To our knowledge, this is the first method to align genomic landscapes at multiple scales according to their shape.

    DOI PubMed J-GLOBAL

  • A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

    Michiaki Hamada, Kiyoshi Asai

    JOURNAL OF COMPUTATIONAL BIOLOGY   19 ( 5 ) 532 - 549  2012.05  [Refereed]

     View Summary

    Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

    DOI

  • A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

    Michiaki Hamada, Kiyoshi Asai

    JOURNAL OF COMPUTATIONAL BIOLOGY   19 ( 5 ) 532 - 549  2012.05  [Refereed]

     View Summary

    Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

    DOI PubMed J-GLOBAL

  • Privacy preservation in information retrieval

    荒井 ひろみ, 清水 佳奈, 浜田 道昭

    人工知能学会全国大会論文集   26   1 - 4  2012

    CiNii

  • Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection

    Michiaki Hamada, Edward Wijaya, Martin C. Frith, Kiyoshi Asai

    BIOINFORMATICS   27 ( 22 ) 3085 - 3092  2011.11  [Refereed]

     View Summary

    Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e. g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.
    Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

    DOI

  • Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection

    Michiaki Hamada, Edward Wijaya, Martin C. Frith, Kiyoshi Asai

    BIOINFORMATICS   27 ( 22 ) 3085 - 3092  2011.11  [Refereed]

     View Summary

    Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e. g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.
    Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

    DOI PubMed J-GLOBAL

  • CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences

    Michiaki Hamada, Koichiro Yamada, Kengo Sato, Martin C. Frith, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( Web-Server-Issue ) W100 - W106  2011.07  [Refereed]

     View Summary

    Although secondary structure predictions of an individual RNA sequence have been widely used in a number of sequence analyses of RNAs, accuracy is still limited. Recently, we proposed a method (called 'CentroidHomfold'), which includes information about homologous sequences into the prediction of the secondary structure of the target sequence, and showed that it substantially improved the performance of secondary structure predictions. CentroidHomfold, however, forces users to prepare homologous sequences of the target sequence. We have developed a Web application (CentroidHomfold-LAST) that predicts the secondary structure of the target sequence using automatically collected homologous sequences. LAST, which is a fast and sensitive local aligner, and CentroidHomfold are employed in the Web application. Computational experiments with a commonly-used data set indicated that CentroidHomfold-LAST substantially outperformed conventional secondary structure predictions including CentroidFold and RNAfold.

    DOI

  • IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming

    Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, Kiyoshi Asai

    BIOINFORMATICS   27 ( 13 ) I85 - I93  2011.07  [Refereed]

     View Summary

    Motivation: Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy.
    Results: We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods.

    DOI

  • Antagonistic RNA aptamer specific to a heterodimeric form of human interleukin-17A/F

    Hironori Adachi, Akira Ishiguro, Michiaki Hamada, Eri Sakota, Kiyoshi Asai, Yoshikazu Nakamura

    BIOCHIMIE   93 ( 7 ) 1081 - 1088  2011.07  [Refereed]

     View Summary

    Interleukin-17 (IL-17) is a pro-inflammatory cytokine produced primarily by a subset of CD4(+) cells, called Th17 cells, that is involved in host defense, inflammation and autoimmune disorders. The two most structurally related IL-17 family members, IL-17A and IL-17F, form homodimeric (IL-17A/A, IL-17F/F) and heterodimeric (IL-17A/F) complexes. Although the biological significance of IL-17A and IL-17F have been investigated using respective antibodies or gene knockout mice, the functional study of IL-17A/F heterodimeric form has been hampered by the lack of an inhibitory tool specific to IL-17A/F. In this study, we aimed to develop an RNA aptamer that specifically inhibits IL-17A/F. Aptamers are short single-stranded nucleic acid sequences that are selected in vitro based on their high affinity to a target molecule. One selected aptamer against human IL-17A/F, AptAF42, was isolated by repeated cycles of selection and counterselection against heterodimeric and homodimeric complexes, respectively. Thus, AptAF42 bound IL-17A/F but not IL-17A/A or IL-17F/F. The optimized derivative, AptAF42dope1, blocked the binding of IL-17A/F, but not of IL-17A/A or IL-17F/F, to the IL-17 receptor in the surface plasmon resonance assay in vitro. Consistently, AptAF42dope1 blocked cytokine GRO-alpha production induced by IL-17A/F, but not by IL-17A/A or IL-17F/F, in human cells. An RNA footprinting assay using ribonucleases against AptAF42dope1 in the presence or absence of IL-17A/F revealed that part of the predicted secondary structure fluctuates between alternate forms and that AptAF42dope1 is globally protected from ribonuclease cleavage by IL-17A/F. These results suggest that the selected aptamer recognizes a global conformation specified by the heterodimeric surface of IL-17A/F. (C) 2011 Elsevier Masson SAS. All rights reserved.

    DOI PubMed J-GLOBAL

  • CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences

    Michiaki Hamada, Koichiro Yamada, Kengo Sato, Martin C. Frith, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( Web Server issue ) W100 - W106  2011.07  [Refereed]

     View Summary

    Although secondary structure predictions of an individual RNA sequence have been widely used in a number of sequence analyses of RNAs, accuracy is still limited. Recently, we proposed a method (called 'CentroidHomfold'), which includes information about homologous sequences into the prediction of the secondary structure of the target sequence, and showed that it substantially improved the performance of secondary structure predictions. CentroidHomfold, however, forces users to prepare homologous sequences of the target sequence. We have developed a Web application (CentroidHomfold-LAST) that predicts the secondary structure of the target sequence using automatically collected homologous sequences. LAST, which is a fast and sensitive local aligner, and CentroidHomfold are employed in the Web application. Computational experiments with a commonly-used data set indicated that CentroidHomfold-LAST substantially outperformed conventional secondary structure predictions including CentroidFold and RNAfold.

    DOI PubMed J-GLOBAL

  • IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming

    Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, Kiyoshi Asai

    BIOINFORMATICS   27 ( 13 ) I85 - I93  2011.07  [Refereed]

     View Summary

    Motivation: Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy.
    Results: We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods.

    DOI PubMed J-GLOBAL

  • Generalized Centroid Estimators in Bioinformatics

    Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

    PLOS ONE   6 ( 2 ) e16450  2011.02  [Refereed]

     View Summary

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.

    DOI PubMed

  • Improving the accuracy of predicting secondary structure for aligned RNA sequences

    Michiaki Hamada, Kengo Sato, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( 2 ) 393 - 402  2011.01  [Refereed]

     View Summary

    Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.

    DOI

  • Improving the accuracy of predicting secondary structure for aligned RNA sequences

    Michiaki Hamada, Kengo Sato, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( 2 ) 393 - 402  2011.01  [Refereed]

     View Summary

    Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.

    DOI PubMed CiNii J-GLOBAL

  • Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

    Michiaki Hamada, Kengo Sato, Kiyoshi Asai

    BMC BIOINFORMATICS   11   586  2010.11  [Refereed]

     View Summary

    Background: Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy ( MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.
    Results: Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the pseudo-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the gamma-centroid estimator.
    Conclusions: This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-) expected accuracy with respect to various evaluation measures including MCC and F-score.

    DOI PubMed J-GLOBAL

  • RactIP: Fast and accurate prediction of RNA-RNA interaction using integer programming

    Yuki Kato, Kengo Sato, Michiaki Hamada, Yoshihide Watanabe, Kiyoshi Asai, Tatsuya Akutsu

    Bioinformatics   26 ( 18 ) i460 - i466  2010.09  [Refereed]

     View Summary

    Motivation: Considerable attention has been focused on predicting RNA-RNA interaction since it is a key to identifying possible targets of non-coding small RNAs that regulate gene expression post-transcriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a trade-off between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited.Results: We present RactIP, a fast and accurate prediction method for RNA-RNA interaction of general type using integer programming. RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the objective function of integer programming using posterior internal and external base-paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA-RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures. © The Author(s) 2010. Published by Oxford University Press.

    DOI PubMed J-GLOBAL

  • RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming

    Yuki Kato, Kengo Sato, Michiaki Hamada, Yoshihide Watanabe, Kiyoshi Asai, Tatsuya Akutsu

    BIOINFORMATICS   26 ( 18 ) i460 - i466  2010.09  [Refereed]

     View Summary

    Motivation: Considerable attention has been focused on predicting RNA-RNA interaction since it is a key to identifying possible targets of non-coding small RNAs that regulate gene expression post-transcriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a tradeoff between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited.
    Results: We present RactIP, a fast and accurate prediction method for RNA-RNA interaction of general type using integer programming. RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the objective function of integer programming using posterior internal and external base-paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA-RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures.

    DOI

  • A non-parametric bayesian approach for predicting rna secondary structures

    Kengo Sato, Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai, Yasubumi Sakakibara

    Journal of Bioinformatics and Computational Biology   8 ( 4 ) 727 - 742  2010.08  [Refereed]

     View Summary

    Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models. © 2010 Imperial College Press.

    DOI

  • Parameters for accurate genome alignment

    Martin C. Frith, Michiaki Hamada, Paul Horton

    BMC Bioinformatics   11   80  2010.02  [Refereed]

     View Summary

    Background: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.Results: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.Conclusions: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/. © 2010 Frith et al
    licensee BioMed Central Ltd.

    DOI PubMed J-GLOBAL

  • Parameters for accurate genome alignment

    Martin C. Frith, Michiaki Hamada, Paul Horton

    BMC BIOINFORMATICS   11   80  2010.02  [Refereed]

     View Summary

    Background: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.
    Results: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that gamma-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.
    Conclusions: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.

    DOI

  • CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score

    Michiaki Hamada, Kengo Sato, Hisanori Kiryu, Toutai Mituyama, Kiyoshi Asai

    BIOINFORMATICS   25 ( 24 ) 3236 - 3243  2009.12  [Refereed]

     View Summary

    Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.
    Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L-3 + c(2)dL(2)) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.

    DOI PubMed J-GLOBAL

  • CENTROIDFOLD: a web server for RNA secondary structure prediction

    Kengo Sato, Michiaki Hamada, Kiyoshi Asai, Toutai Mituyama

    NUCLEIC ACIDS RESEARCH   37 ( Web Server issue ) W277 - W280  2009.07  [Refereed]

     View Summary

    The CENTROIDFOLD web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the 'execute CentroidFold' button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CENTROIDFOLD software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement.

    DOI PubMed J-GLOBAL

  • Predictions of RNA secondary structure by combining homologous sequence information.

    Hamada M, Sato K, Kiryu H, Mituyama T, Asai K

    Bioinformatics (Oxford, England)   25 ( 12 ) 330 - 338  2009.06  [Refereed]  [International journal]

     View Summary

    MOTIVATION: Secondary structure prediction of RNA sequences is an important problem. There have been progresses in this area, but the accuracy of prediction from an RNA sequence is still limited. In many cases, however, homologous RNA sequences are available with the target RNA sequence whose secondary structure is to be predicted. RESULTS: In this article, we propose a new method for secondary structure predictions of individual RNA sequences by taking the information of their homologous sequences into account without assuming the common secondary structure of the entire sequences. The proposed method is based on posterior decoding techniques, which consider all the suboptimal secondary structures of the target and homologous sequences and all the suboptimal alignments between the target sequence and each of the homologous sequences. In our computational experiments, the proposed method provides better predictions than those performed only on the basis of the formation of individual RNA sequences and those performed by using methods for predicting the common secondary structure of the homologous sequences. Remarkably, we found that the common secondary predictions sometimes give worse predictions for the secondary structure of a target sequence than the predictions from the individual target sequence, while the proposed method always gives good predictions for the secondary structure of target sequences in all tested cases. AVAILABILITY: Supporting information and software are available online at: http://www.ncrna.org/software/centroidfold/ismb2009/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed J-GLOBAL

  • Prediction of RNA secondary structure using generalized centroid estimators

    Michiaki Hamada, Hisanori Kiryu, Kengo Sato, Toutai Mituyama, Kiyoshi Asai

    BIOINFORMATICS   25 ( 4 ) 465 - 473  2009.02  [Refereed]

     View Summary

    Motivation: Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures.
    Results: We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics.

    DOI PubMed J-GLOBAL

  • A Non-parametric Bayesian Approach for Predicting RNA Secondary Structures

    Kengo Sato, Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai, Yasubumi Sakakibara

    ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS   5724   286 - +  2009  [Refereed]

     View Summary

    Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models.

  • Large scale similarity search for locally stable secondary structures among RNA sequences

    Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai

    IPSJ Transactions on Bioinformatics   2   36 - 46  2009  [Refereed]

     View Summary

    Recently, a large number of candidates of non-coding RNAs (ncRNAs) has been predicted by experimental or computational approaches. Moreover, in genomic sequences, there are still many interesting regions whose functions are unknown (e.g., indel conserved regions, human accelerated regions, ultraconserved elements and transposon free regions) and some of those regions may be ncRNAs. On the other hand, it is known that many ncRNAs have characteristic secondary structures which are strongly related to their functions. Therefore, detecting clusters which have mutually similar secondary structures is important for revealing new ncRNA families. In this paper, we describe a novel method, called RNAclique, which is able to search for clusters containing mutually similar and locally stable secondary structures among a large number of unaligned RNA sequences. Our problem is formulated as a constraint quasiclique search problem, and we use an approximate combinatorial optimization method, called GRASP, for solving the problem. Several computational experiments show that our method is useful and scalable for detecting ncRNA families from large sequences. We also present two examples of large scale sequence analysis using RNAclique. © 2009 Information Processing Society of Japan.

    DOI CiNii

  • Software.ncrna.org: web servers for analyses of RNA sequences

    Kiyoshi Asai, Hisanori Kiryu, Michiaki Hamada, Yasuo Tabei, Kengo Sato, Hiroshi Matsui, Yasubumi Sakakibara, Goro Terai, Toutai Mituyama

    NUCLEIC ACIDS RESEARCH   36 ( Web Server issue ) W75 - W78  2008.07  [Refereed]

     View Summary

    We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.

    DOI PubMed J-GLOBAL

  • Mining frequent stem patterns from unaligned RNA sequences

    Michiaki Hamada, Koji Tsuda, Taku Kudo, Taishin Kin, Kiyoshi Asai

    BIOINFORMATICS   22 ( 20 ) 2480 - 2487  2006.10  [Refereed]

     View Summary

    Motivation: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly.
    Results: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder.

    DOI PubMed J-GLOBAL

▼display all

Books and Other Publications

Misc

  • CAFs induce formation of metastatic human breast tumor cell clusters with partial epithelial-mesenchymal transition

    Akira Orimo, Yasuhiko Ito, Yoshihiro Mezawa, Kaidiliavi Sulidan, Yataro Daigo, Nadila Wali, Okio Hino, Kazuyoshi Takeda, Michiaki Hamada, Yuko Matsumura

    CANCER SCIENCE   109   797 - 797  2018.12  [Refereed]

    Research paper, summary (international conference)  

  • トピックモデルを用いたがんゲノムの変異シグネチャー解析 (ニューロコンピューティング)

    松谷 太郎, 宇恵野 雄貴, 福永 津嵩, 浜田 道昭

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 109 ) 159 - 164  2017.06

    CiNii

  • トピックモデルを用いたがんゲノムの変異シグネチャー解析 (情報論的学習理論と機械学習)

    松谷 太郎, 宇恵野 雄貴, 福永 津嵩, 浜田 道昭

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 110 ) 105 - 110  2017.06

    CiNii

  • Privacy-Preserving Search for Chemical Compound Databases

    Kana Shimizu, Koji Nuida, Hiromi Arai, Shigeo Mitsunari, Nuttapong Attrapadung, Michiaki Hamada, Koji Tsuda, Takatsugu Hirokawa, Jun Sakuma, Goichiro Hanaoka, Kiyoshi Asai

    bioRxiv   ( 013995 )  2015.01

    Internal/External technical report, pre-print, etc.  

    DOI

  • RNA secondary structure prediction from multi-aligned sequences

    Michiaki Hamada

       2013.07

    Internal/External technical report, pre-print, etc.  

     View Summary

    It has been well accepted that the RNA secondary structures of most<br />
    functional non-coding RNAs (ncRNAs) are closely related to their functions and<br />
    are conserved during evolution. Hence, prediction of conserved secondary<br />
    structures from evolutionarily related sequences is one important task in RNA<br />
    bioinformatics; the methods are useful not only to further functional analyses<br />
    of ncRNAs but also to improve the accuracy of secondary structure predictions<br />
    and to find novel functional RNAs from the genome. In this review, I focus on<br />
    common secondary structure prediction from a given aligned RNA s...

  • Generalized Centroid Estimators in Bioinformatics

    Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

    PLoS ONE 6(2):e16450, 2011    2013.05

    Internal/External technical report, pre-print, etc.  

     View Summary

    In a number of estimation problems in bioinformatics, accuracy measures of<br />
    the target problem are usually given, and it is important to design estimators<br />
    that are suitable to those accuracy measures. However, there is often a<br />
    discrepancy between an employed estimator and a given accuracy measure of the<br />
    problem. In this study, we introduce a general class of efficient estimators<br />
    for estimation problems on high-dimensional binary spaces, which representmany<br />
    fundamental problems in bioinformatics. Theoretical analysis reveals that the<br />
    proposed estimators generally fit with commonly-used accura...

    DOI

  • Fighting against uncertainty: An essential issue in bioinformatics

    Michiaki Hamada

       2013.05

    Internal/External technical report, pre-print, etc.  

     View Summary

    Many bioinformatics problems, such as sequence alignment, gene prediction,<br />
    phylogenetic tree estimation and RNA secondary structure prediction, are often<br />
    affected by the &quot;uncertainty&quot; of a solution; that is, the probability of the<br />
    solution is extremely small. This situation arises for estimation problems on<br />
    high-dimensional discrete spaces in which the number of possible discrete<br />
    solutions is immense. In the analysis of biological data or the development of<br />
    prediction algorithms, this uncertainty should be handled carefully and<br />
    appropriately. In this review, I will explain several methods t...

  • 加法準同型暗号を用いた化合物データベースの秘匿検索プロトコル

    縫田光司, 清水佳奈, 荒井ひろみ, 浜田道昭, 津田宏治, 広川貴次, 花岡悟一郎, 佐久間淳, 浅井潔

    情報処理学会シンポジウムシリーズ(CD-ROM)   2012 ( 3 ) ROMBUNNO.2C2-1  2012.10

    J-GLOBAL

  • 半教師あり学習を用いたRNA二次構造予測アルゴリズムの提案

    米本悠, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本RNA学会年会要旨集   14th   160  2012.07

    J-GLOBAL

  • カノニカル分布に基づいたRNA二次構造安定性解析手法の開発

    森遼太, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本RNA学会年会要旨集   14th   154  2012.07

    J-GLOBAL

  • 検索行動におけるプライバシ保護

    荒井ひろみ, 清水佳奈, 浜田道昭, 津田宏治, 広川貴次, 佐久間淳, 浅井潔, 浅井潔

    人工知能学会全国大会論文集(CD-ROM)   26th   ROMBUNNO.3I2-OS-20-1  2012

    J-GLOBAL

  • カノニカル分布に基づくRNA二次構造の存在確率分布記述手法の開発

    森遼太, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本分子生物学会年会プログラム・要旨集(Web)   35th   WEB ONLY 1P-0244  2012

    J-GLOBAL

  • 半教師あり学習を用いたRNA二次構造予測アルゴリズムの提案

    米本悠, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本分子生物学会年会プログラム・要旨集(Web)   35th   WEB ONLY 3P-0071  2012

    J-GLOBAL

  • Maximizing Expected Accuracy in Bioinformatics(Industrial Materials)

    Hamada Michiaki, Asai Kiyoshi

    Bulletin of the Japan Society for Industrial and applied Mathematics   21 ( 1 ) 34 - 39  2011.03

    CiNii

  • 期待精度最大化とバイオインフォマティクス

    浜田道昭, 浅井潔

    応用数理   21 ( 1 ) 34 - 39  2011.03

    J-GLOBAL

  • CentroidFold:RNA二次構造予測ウェブサーバー

    佐藤健吾, 佐藤健吾, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔, 光山統泰

    日本RNA学会年会要旨集   11th   96  2009.07

    J-GLOBAL

  • CentroidHomfold:相同配列群の情報を利用したRNAの2次構造予測

    浜田道昭, 浜田道昭, 佐藤健吾, 佐藤健吾, 木立尚孝, 木立尚孝, 光山統泰, 浅井潔, 浅井潔

    日本分子生物学会年会講演要旨集   32nd ( Vol.1 ) 48  2009

    J-GLOBAL

  • 期待精度を最大化するRNA情報解析手法の開発

    浜田道昭, 浜田道昭, 木立尚孝, 佐藤健吾, 佐藤健吾, 光山統泰, 浅井潔, 浅井潔

    生化学     2P-0776  2008

    J-GLOBAL

  • Support Vector Machineを用いた機能性RNAファミリーの分類

    浜田道昭, 浜田道昭, 浜田道昭, 加藤毅, 加藤毅, 金大真, 津田宏治, 浅井潔, 浅井潔

    RNAミーティング   7th   69  2005

    J-GLOBAL

  • A High Performance Computing Environments for Prediction of Activity and function of Biomolecules : An Application to Analysis of HIV Protease Inhibitors

    Hamada Michiaki, Feng Cheng, Inagaki Yuichiro, Nagashima Umpei, Murakami Kazuaki, Chuman Hiroshi

    Transactions of the Japan Society for Industrial and Applied Mathematics   14 ( 4 ) 267 - 288  2004.12

     View Summary

    We have developed an object oriented large-scale scientific simulations system that contains algorithms of molecular scientific computing programs, called Embedded High-Performance Computing (EHPC). As an application of the system, &quot;EHPC-Drug platform&quot; has been constructed for rational drug design. It can provide a high-performance computing ability for exhaustive conformational analyses of biomolecules, generating computation of their three-dimensional topological descriptors, and docking calculations with their target receptors. To enhance its computing abilities, we are also planning to ...

    CiNii

  • A High Performance Computing Environments for Prediction of Activity and Function of Biomolecules:-An Application to Analysis of HIV Protease Inhibitors

    HAMADA MICHIAKI, FENG C, INAGAKI YUICHIRO, NAGASHIMA UMPEI, MURAKAMI KAZUAKI, CHUMAN HIROSHI

    日本応用数理学会論文誌   14 ( 4 ) 267 - 288  2004.12

    J-GLOBAL

  • Development and application of a platform for drug discovery using grid technology and XML database

    HAMADA MICHIAKI, INAGAKI YUICHIRO, CHUMAN HIROSHI

    構造活性相関シンポジウム講演要旨集   32nd   141 - 144  2004.11

    J-GLOBAL

  • 薬師(Xsi)―創薬のための仮想スクリーニング統合システムの開発

    稲垣祐一郎, 浜田道昭, 山崎一人, 金岡昌治, 中馬寛

    情報計算化学生物学会大会予稿集   2004   205 - 206  2004.07

    J-GLOBAL

  • DrugMLとGrid創薬

    浜田道昭, 稲垣祐一郎, 中馬寛

    日本コンピュータ化学会年会講演予稿集   2004   51  2004.05

    J-GLOBAL

  • Drug Discovery Using Grid Technologies and DrugML.

    HAMADA MICHIAKI, INAGAKI YUICHIRO, CHUMAN HIROSHI

    構造活性相関シンポジウム講演要旨集   31st   101 - 102  2003.11

    J-GLOBAL

▼display all

Industrial Property Rights

Awards

  • 早稲田大学 次代の中核研究者 2022

    2022.04   早稲田大学  

    Winner: 浜田道昭

  • 早稲田大学リサーチアワード(国際研究発信力)

    2021.12   早稲田大学  

    Winner: 浜田道昭

  • 平成29年度科学技術分野の文部科学大臣表彰 若手科学者賞

    2017.04   文部科学省  

    Winner: 浜田道昭

  • 産業技術総合研究所 理事長賞(研究)

    2016.04   産業技術総合研究所  

    Winner: 浜田道昭

Research Projects

  • AIアプタマー創薬プロジェクト

    国立研究開発法人科学技術振興機構  戦略的創造研究推進事業(CREST)

    Project Year :

    2021.04
    -
    2024.03
     

  • リピート要素のde novo発見に基づく長鎖ノンコーディングRNAの機能の解明

    日本学術振興会  科学研究費助成事業 基盤研究(A)

    Project Year :

    2020.04
    -
    2023.03
     

    浜田 道昭, 小野口 真広, 福永 津嵩

  • 発達期ダイオキシンと老年期の高次認知機能低下の関係性解明

    日本学術振興会  科学研究費助成事業 基盤研究(A)

    Project Year :

    2019.04
    -
    2022.03
     

    掛山 正心, 浜田 道昭, 久保 健一郎, 皆川 栄子, 前川 文彦

     View Summary

    我々は動物実験により、ダイオキシン等の胎仔期曝露が認知機能を低下させることを認知課題成績と神経細胞の微細形態変化の双方で報告した。本研究では到達目標を、ダイオキシン等の発達期曝露が認知症の発症・増悪に関与する科学的知見を集積し、認知症の毒性エンドポイントとしての重要性を示すことにおく。(1)ダイオキシン等によって老年期に生じる認知的柔軟性の低下に焦点をあて、ヒト調査ならびに動物毒性実験により、影響の質と程度、そしてその毒性機構を明らかにして、(2)その成果をもとに、ヒト調査ならびに動物毒性実験において、高次認知機能の表現型解析技術を確立することを目的としている。本年度は、ヒト・コホート調査と動物毒性実験を実施するため、ヒト調査で用いる課題アプリを作成するとともに、コホート調査手続きを行った。タブレット端末での課題提示によるリモート評価を行う基盤整備も進めた。動物実験では認知的柔軟性と脳活動の定量評価を行うため、課題の作成と毒性試験の準備を行った。IntelliCageを用いた課題とともに、タッチスクリーンオペラント実験装置を用いた課題の確立も行なった。理化学研究所との共同研究により、アルツハイマー病モデルマウスを対象とした表現型解析を行い、認知症とメンタルスキーマの関係についての有望な知見を得た(論文投稿中)。また、本プロジェクトで取得するデータをモデリングするため、既存データのメタ解析を実施した。

  • ceRNAネットワーク構造の解読を基盤とした、全く新しい抗がん剤開発戦略の開発

    日本学術振興会  科学研究費助成事業 基盤研究(B)

    Project Year :

    2018.07
    -
    2021.03
     

    秋光 信佳, 浜田 道昭

     View Summary

    近年、RNA-RNA相互作用やRNA-RNA結合タンパク質との相互作用を基盤とした遺伝子発現制御ネットワークの存在が注目されている。ここで興味深いのは、これらRNAとRNA結合タンパク質の作り出すネットワークは相互作用を通じて巨大なネットワークを形成していることである。たとえば、小分子ノンコーディングRNAであるマイクロRNAは、それ自身と相補的な塩基対を有するmRNAに結合してmRNAを分解したり翻訳抑制することでmRNAの発現量を制御しているが、ひとつのマイクロRNAが標的とするmRNAは一つでは無く複数存在する。一方、ひとつのmRNAは複数種類のマイクロRNAによって発現制御を受けている。このように、RNA-RNA相互作用とRNA-RNA結合タンパク質との相互作用は、多数対多数の相互作用となっている。しかしながら、このような多数対多数の相互作用を基盤としたネットワークの構造やその生理的役割については不明な点が多数存在する。そこで、本研究では、RNA-RNA相互作用やRNA-RNA結合タンパク質との相互作用を解析するための技術開発等を行う。そして、この巨大ネットワークの生理的役割や疾患における役割を解明する。これまでに、RNAとRNA結合タンパク質との相互作用を解明する技術開発を進めてきており、研究論文を発表した(Yamada T. et al., Cell Rep)。内容は、公共データベース上に公開されている次世代シーケンサーデータをもとに、RNA結合タンパク質とその分解標的RNAとの発現量相関を調べるシステムを開発した。そして、このシステムが有効であることを複数のRNA結合タンパク質で検証し、研究成果を論文発表した。

  • RNA-クロマチン相互作用予測と応用

    日本学術振興会  科学研究費助成事業 挑戦的研究(萌芽)

    Project Year :

    2017.06
    -
    2021.03
     

    浜田 道昭, 岩切 淳一

     View Summary

    哺乳類ゲノムの大部分は,コーディングあるいはノンコーディングRNAを転写している.このうちノンコーディングRNAの一部は,クロマチンと相互作用を行い,エピジェネティックな制御を行っていることが示唆されている.RNAとクロマチン相互作用のメカニズムを解明するために,lncRNAとクロマチンの相互作用予測を行うモデルを構築し,構築したモデルからどのような特徴が相互作用い寄与しているかの検討を行った.今回考えた特徴としては下記のものである:R-loop形成,RNA:DNA triplex, RNA結合によるscafold.このうち,R-loop形成に関しては配列相補性をアラインメントにより同定することにより推定した.またこの際には,RNAアクセシビリティも考慮するようにした.RNA:DNA triplexに関しては,既存のtriplex予測ツールを利用した.機械学習モデルとしては,ランダムフォレストを主に利用した.これは,ランダムフォレストは,分類に寄与した特徴量の導出が容易に可能となるためである.実際のデータとしては,RNAクロマチン相互作用に関する大規模実験データを用いて,正例と負例を作成し,構築したモデルの学習を行った.予測精度の評価はクロスバリデーションを用いたが,現状十分な予測精度は出ていない.特徴量および学習データの両面から現在詳細に検討を行っている段階である.機械学習モデルに関しても深層学習なども含めて検討を行うことを計画している.

  • 人工知能技術を用いた革新的アプタマー創薬システムの開発

    JST  戦略的創造研究推進事業(CREST)

    Project Year :

    2018.10
    -
    2021.03
     

    浜田道昭

  • RNA-クロマチン相互作用予測と応用

    文部科学省  挑戦的研究(萌芽)

    Project Year :

    2017.03
    -
    2020.04
     

    浜田道昭

  • 機能エレメントと深層学習に基づく長鎖ノンコーディングRNAの機能分類

    日本学術振興会  科学研究費助成事業 若手研究(A)

    Project Year :

    2016.04
    -
    2020.03
     

    浜田 道昭

     View Summary

    ヒトなどの高等生物では,タンパク質に翻訳されずにRNAのまま機能を発揮する長鎖ノンコーディングRNA(lncRNA)が数多く存在していることが示唆されているがその大部分の機能は未解明である.lncRNAの機能エレメントを同定するための研究として,下記の研究を行った.
    - リボソーム結合lncRNAの同定と配列解析:網羅的実験データを用いて,リボソームRNAの結合するlncRNAの同定を行うと同時に配列特徴の抽出を行い,その生物学的意義について検討を行った.関連する論文を2報出版した(BMC Genomics. 2018 Dec 31;19(Suppl 10):906, BMC Genomics. 2018 May 29;19(1):414. doi: 10.1186/s12864-018-4765-z.)
    - ヒトとマウスの網羅的なlncRNA-RNA相互作用予測を可能とするWebサーバLncRRISearchを公開した(http://rtools.cbrc.jp/LncRRIsearch/)
    - リピートに結合するRBPの網羅的同定:我々の過去の研究で,lncRNAの組織特異的発現にリピート要素が関連していることを示したが,さらなる機能解析を進めるために,リピートに結合するlncRNAの同定を行った.現在結果を詳細に検討中であり,今年中に論文として出版することを計画している.
    - RNA-RNA相互作用ツールRIblastの高度化:p-valueの計算を行う方法の実装を行った.これにより,実験生物学者の利用が促進されることが期待される(J Comput Biol. 2018 Sep;25(9):976-986)

  • 機能エレメントと深層学習に基づく長鎖ノンコーディングRNAの機能分類

    文部科学省  若手研究(A)

    Project Year :

    2016.04
    -
    2020.03
     

    浜田道昭

  • RNA informatics for epi-transcriptome analysis

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2016.04
    -
    2019.03
     

    Asai Kiyoshi

     View Summary

    The energy parameters of the important modified bases, inosine and N6 methyladenosine were identified by a combination of thermometric experiments and molecular simulations. The effect of estimation error on structure prediction was evaluated and presented by theoretical analysis and computer experiments. A model of the effect of A-to-I editing on translational repression efficiency by miRNA was constructed and presented in a joint study using the identified inosine parameters.
    We have improved RintD, an analysis tool for secondary structure probability distribution, by developing RintW, which calculates the distribution of base pair probability, and RintC, which speeds up the calculation with maximum base pair constraint. At that time, the effect of the Fourier transform on the numerical error was analyzed using the accuracy guarantee calculation, and it was shown that the large probability was reliable.

  • ヒストンバリアントに基づくクロマチンの機能の推定

    日本学術振興会  科学研究費助成事業 新学術領域研究(研究領域提案型)

    Project Year :

    2016.04
    -
    2018.03
     

    浜田 道昭

     View Summary

    (1) ヒストンバリアントを含むクロマチンマークに対するクロマチン状態の推定.
    ヒストンバリアントのデータとしては,ヒト:Kujirai+, NAR (2016) 44, 6127-41,マウス:Maehara+, Epigenetics Chromatin (2015) 17;8:35を用いた.これらのデータを用いて,研究代表者が開発した手法を用いてクロマチン状態の推定を行った.さらに,推定されたクロマチン状態と,様々なゲノムアノテーションとの相関を調査した.
    (2)データベースlncRRIdb: 発現,局在情報を統合したlncRNA-RNA相互作用データベース
    本研究では,クロマチン機能を長鎖ノンコーディングRNA(lncRNA)の観点から特徴づけることを試みるために,lncRNAと相互作用を行うRNAの網羅的なデータベースの構築を行った.これは研究代表者らが開発したRIblastを用いて,計算機による網羅的な相互作用予測を行った結果を,発現および局在の実験情報とともに格納したデータベースである
    (3)階層的なクロマチン状態を推定するための情報技術の開発.
    プロモーターやエンハンサーも,階層的な構造を有していると考えた.例えば,promoter⇒strong promoter, weak promoter, bivalent promoterなどである.従来のクロマチン状態の推定手法においては,このような階層性を考えることはできなかったため,我々は独自に手法の開発を行った.そのためのプロトタイプシステムの開発を行い小さなデータを用いてその有効性を検証した.

  • ヒストンバリアントに基づくクロマチンの機能の推定

    文部科学省  新学術領域研究(研究領域提案型)

    Project Year :

    2016.04
    -
    2018.03
     

    浜田道昭

  • Comprehensive prediction of RNA-protein interactinos

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2013.04
    -
    2016.03
     

    Asai Kiyoshi, YURA Kei

     View Summary

    The aim of the research was to predict the RNA-protein interactions for non-coding RNAs and function-known proteins. Our analysis of RNA-protein complex in PDB showed that the nucleotides that do not form base-pairs in RNA 2D structures but form hydrogen bond with amino acids have lower base-pairing probabilities than the nucleotides that form neither base-pairs or hydrogen bonds with amino acids. We developed a new method to understand the landscape of the distribution of RNA 2D structures, by efficiently calculating the probabilities of all the structures with specific Hamming distances from the canonical structures. In order to predict the joint structure of RNA-protein complex, we performed rigid body docking simulations. After revising the force field for RNAs, our docking simulations showed better accuracy than previous methods, and we reported that in a peer reviewed journal.

  • Development of basic technology for privacy-preserving bioinformatics and its application

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Project Year :

    2013.04
    -
    2016.03
     

    Hamada Michiaki, Shimizu Kana, Hanaoka Goichiro, Tsuda Koji, Frith Martin, Asai Kiyoshi

     View Summary

    It is highly demanded to deal with the information of personal genome and chemical compound secretly, because they are sensitive information that should not be leaked. On the other hand, from a viewpoint of "open" science, it is important to perform data-mining by combining those sensitive information with other data. In this study, we have developed several methods to perform data-mining, making those information secret. Specifically, we developed (i) privacy-preserving search for chemical database, (ii) privacy-preserving genome sequence search with hidden Markov Model (HMM) and (iii) privacy preserving sequence alignment, all of which will be useful toward open science of biology.

  • Research on structure predictions of RNA with modified nucleotieds

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (A)

    Project Year :

    2012.04
    -
    2016.03
     

    Hamada Michiaki

     View Summary

    We have developed bioinformatic methods for predicting secondary structures including modified bases. Due to the limitation of the known structures with modified bases, we employed a semi-supervised learning approach for predicting RNA secondary structures using RNA sequences with and without secondary structures. Moreover, we have developed an integrated web server, Rtools, for performing various analyses based on RNA secondary structures.

  • Platform of large scale and high quality genomics and bioinformatics: Towards the advancement of genome sciences in academia

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)

    Project Year :

    2010.04
    -
    2016.03
     

    KOHARA Yuji, KATO Kazuto, TOYODA Atsushi, KUROKI Yoko, SUGANO Sumio, SUZUKI Yutaka, HAYASHI Tetsuya, YAMAMOTO Ken, TSUJI Shoji, INOUE Ituro, KUROKAWA Ken, MORISHITA Shinichi, NAKAMURA Yasukazu, TABATA Satoshi, KUHARA Satoshi, IWASAKI Wataru, SESE Jun, TAKAHASHI Hiroki, ASAI Kiyoshi, KASAHARA Masahiro, SAKAKIBARA Yasubumi, YADA Tetsushi, YAMAGATA Zentaro, MUTO Kaori, IDA Ryuichi, MASUI Tohru, KURIYAMA Mariko, TAKAGI Toshihisa, FUJIYAMA Asao, HATTORI Masahira, OGURA Yoshitoshi, TOKUNAGA Katsushi, KUWANO Ryozo, OHASHI Jun, ITOH Takehiko, HIRAKAWA Hideki, NOGUCHI Hideki, MATSUOKA Satoshi, OGASAWARA Naotake, NAKAMURA Kensuke, HAMADA Michiaki, KANAYA Shigehiko, ANZAI Yuichiro, OKADA Kiyotaka, SAKAKI Yoshiyuki, TAKAKU Fumimaro, TOYOSHIMA Kumao, NAKAMURA Keiko, HOTTA Yoshiki, YONEZAWA Akinori, YOSHIKAWA Hiroshi, YOSHIDA Mitsuaki, INOKO Hidetoshi, TODA Tatsushi, INAZAWA Johji, GOJOBORI Takashi, URUSHIHARA Hideko, TAKEDA Hiroyuki, SHIROISHI Toshihiko, ITOH Takashi, SATOH Noriyuki, MATSUDA Hideo, GOTO Susumu, TSUDA Masataka

     View Summary

    We have provided technologies of large scale and high quality genomics and bioinformatics to many KAKENHI projects, 60 to 90 subjects every year and altogether 464 subjects, based on application and selection. This kind of support became possible by concentrating to a limited number of DNA sequencing centers under the situation that there was unexpectedly fast advancement of these technologies in the world. Our activity has led to 363 papers including the Coelacanth genome paper. The KAKENHI subjects that we supported cover all the KAKENHI items and almost divisions of life science domain. Furthermore, we have developed new methodologies to solve the problems that emerged from the support activity : One of them is the genome assembly software PLATANUS that has become a key method to decipher difficult genomes. Such a virtuous circle and the outcome show that the platform is essential and effective in life sciences.

▼display all

Presentations

  • AI aptamer drug discovery project

    Michiaki Hamada  [Invited]

    Presentation date: 2022.03

  • RNA研究の最前線】RNA情報学を基軸とした生命科学・医薬学研究

    浜田道昭  [Invited]

    日本医科大学 講演会 

    Presentation date: 2022.02

  • AIアプタマー創薬

    浜田道昭  [Invited]

    分子生物学会 

    Presentation date: 2021.12

  • ゲノム社会とバイオインフォマティクス

    浜田道昭  [Invited]

    日本バイオインフォマティクス学会・日本オミックス医学会 合同シンポジウム, IIBMP2021 

    Presentation date: 2021.09

  • AIアプタマー創薬プロジェクト

    浜田道昭  [Invited]

    日本医科大学・早稲田大学合同シンポジウム 

    Presentation date: 2021.06

  • RNA情報学の最前線

    浜田道昭  [Invited]

    生命情報科学勉強会@宮崎大学 

    Presentation date: 2021.05

  • RNAバイオインフォマティクスの最前線

    浜田道昭  [Invited]

    名古屋大学 特別講演 

    Presentation date: 2021.01

  • RNAを基軸とした創薬研究

    浜田道昭  [Invited]

    EWE講演会 

    Presentation date: 2021.01

▼display all

Specific Research

  • ノンコーディングRNA解析情報基盤技術の研究

    2020  

     View Summary

    ヒトなどの高等真核生物で多数発見されている長鎖ノンコーディングRNAの機能を解明するために,基盤情報技術を構築し様々なバイオインフォマティクスの解析を行った.具体的には以下を行った.・局在と選択的スプライシングの関連性に関する網羅的解析・トランスクリプトームなm6A修飾の測定データから,高精度にm6A修飾位置を同定するためのツールMoAIMSの開発・ゲノムワイドなR-loop構造の同定と,その特徴の抽出

  • 秘密分散手法を用いた生命情報秘匿解析手法の研究

    2019  

     View Summary

    秘密分散法を用いて,アフィンギャップを用いた配列比較手法を安全に行うための手法の考案および実装を行った.既存手法との比較を行い,既存手法に比べて計算速度が大幅に改善することが確かめられた.[1] 深見 匠、浜田 道昭, アフィンギャップを考慮した安全な個人ゲノム比較, 2019/12/3, 第42回日本分子生物学会年会, 福岡国際会議場・マリンメッセ福岡[2] 深見匠, 浜田道昭, セキュアな個人ゲノム類似度計算, 2019年 暗号と情報セキュリティシンポジウム,2019年1月22日〜25日,びわ湖大津プリンスホテル

  • 統合オミックスデータ駆動生物学の数理情報基盤と実践

    2018  

     View Summary

    長鎖ノンコーディングRNAの機能の解明に向けたバイオインフォマティクス技術として,深層学習技術を用いた,m6A修飾の予測アルゴリズム/ツールの開発を行った.また,RNA-RNA相互作用を,配列情報のみを入力とし高速・高精度によろ即するためのアルゴリズムの開発を行った.さらに,モデル選択技術を用いたがんゲノムデータの変異シグネチャーの予測を行う基盤情報技術の開発を行った.

  • RNA-クロマチン相互作用予測と応用

    2016  

     View Summary

    RNAとクロマチンの相互作用を配列情報のみから推定するための手法の開発に向けた以下の研究成果を得た.1. RNAとタンパク質の複合体構造を予測(ドッキング)を行うための新規手法を開発した.この手法の中では,分子動力学シミュレーションの結果を,複合体構造の評価関数に組み入れることによって,既存の手法に比べて大幅な精度の向上が実現された2. RNAの構造予測のための統合WebサーバRtoolsを構築し,公開をした.このウェブサーバーを用いることにより,RNAの配列情報のみから,構造に関する様々な予測情報(2次構造,塩基対確率行列,ステム,バルジ,ループなどの形成確率等)を得ることが可能となる.このような情報はRNA-クロマチン相互作用を予測する際にも有用となる

  • 統合オミックスデータ駆動生物学の数理情報基盤

    2016  

     View Summary

    様々なオミックスデータを情報解析するための方法として以下の研究成果を得た・メタゲノムデータを確率的にモデリングするための確率モデルの開発を行った.この確率モデルにおいては,自然言語分野で用いられるLDAを,メタゲノムデータに応用することにより,細菌群が推定することが可能となる.推定された細菌群と広く知られているエンテロタイプとの関連性を詳細に調べることにより,細菌群の生物学的意味付けを与えた.・シークエンシングデータから植物ゲノムの変異を同定するためのパイプラインを構築した.構築したパイプラインを用いて,植物の変異体(ミュータント)の解析を詳細に行った.本研究は,理化学研究所との共同研究である.・タンパク質やDNA配列のモチーフの確率モデルであるプロファイルHMMを,暗号技術を用いることにより,モデル情報およびクエリの情報を秘匿したまま検索を行う手法の開発を行った.本手法では,加法準同型暗号を用いることにより,足し算が暗号化したまま可能となることが本質的に用いられている.

  • lncRNA-RNA相互作用の網羅的予測と実験情報を統合したデータベースの構築

    2015  

     View Summary

    本研究では、第一に、高速にRNA-RNAの相互作用を予測するためのパイプラインシステムを構築した。さらに、パイプラインシステムを京コンピュータに実装した。第2に、このパイプラインを用いてヒトのlncRNAを対象に網羅的な相互作用相手の予測を行い、得られた結果をデータベースとして公開を行った。APBC2016において、浜田が口頭発表を行うと同時に、ジャーナル論文(BMC Genomics)に論文が掲載された。

  • エピゲノムの統合的理解に向けた情報技術の開発とデータ駆動型生物学の実践

    2015  

     View Summary

    今年度は、昨年度発表した論文[1]のプログラムの、ソースコードの一般公開に向けて、プログラムの整理、および、改良を行った。具体的には、各位置においてクロマチン状態の事後確率が出力可能となるように変更を行った。[1] Michiaki Hamada*, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai, Learning chromatin states with factorized information criteria, Bioinformatics, Bioinformatics (2015) doi: 10.1093/bioinformatics/btv163 First published online: March 24, 2015

  • エピジェネティクスデータからクロマチン状態を推定する方法論の研究と応用

    2014  

     View Summary

    Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized.&nbsp;Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

▼display all

 

Syllabus

▼display all

 

Committee Memberships

  • 2020.04
    -
    Now

    日本バイオインフォマティクス学会  幹事

  • 2021.09
    -
     

    2021年日本バイオインフォマティクス学会年会・第10回生命医薬情報学連合大会(IIBMP2021)大会長

  • 2015.04
    -
    2017.03

    日本バイオインフォマティクス学会  理事