Updated on 2024/12/24

写真a

 
HAMADA, Michiaki
 
Affiliation
Faculty of Science and Engineering, School of Advanced Science and Engineering
Job title
Professor
Degree
Ph.D. ( 2009.03 Tokyo Institute of Technology )
Mail Address
メールアドレス
Profile

Dr. Michiaki Hamada was born in 1977. He graduated from the mathematical institute of Tohoku University in Mar. 2002. In Apr. 2002, he joined to Fuji Research Institute corporation (FRIC), whose current name is Mizuho Information & Research institute, Inc (MHRI), as a researcher, and conducted system development for science technology. During a researcher in FRIC, he started to do research on RNA bioinformatics under a support of “Functional RNA project” funded by NEDO. He received his PhD from Tokyo Institute of technology at 2009. Currently, Dr. Michiaki Hamada is an associate professor of Faculty of Science and Engineering at Waseda University, and is the principle investigator of Bioinformatics Laboratory. He is also a visiting researcher of AIST in Japan. He has been a board member of Japanese Society of Bioinformatics (JSBi) since 2014. His interests includes RNA informatics, sequence analysis, epigenetics, data-mining and machine learning. He aims to dvelope killer tools in the analysis of biological data.

Research Experience

  • 2023.04
    -
    Now

    Japanese Society for Bioinformatics (JSBi)   Vice president

  • 2022.04
    -
    Now

    早稲田大学 次代の中核研究者

  • 2018.04
    -
    Now

    Waseda University   Faculty of Science and Engineering   Professor

  • 2017.04
    -
    Now

    Nippon Medical School

  • 2016.10
    -
    Now

    産業技術総合研究所   生体システムビッグデータ解析オープ ンイノベーションラボラトリ(CBBD-OIL)   招聘研究員 / 班長

  • 2014.04
    -
    2018.03

    Waseda University Faculty of Science and Engineering   Associate Professor

  • 2010.10
    -
    2014.03

    The University of Tokyo

  • 2002.04
    -
    2010.09

    株式会社 富士総合研究所   研究員

▼display all

Education Background

  • 2005.10
    -
    2009.03

    Tokyo Institute of Technology   Interdisciplinary Science and Engineering   Intelligent Systems Science  

  • 2000.04
    -
    2002.03

    Tohoku University   Graduate School of Science   Department of Mathematics  

  • 1996.04
    -
    2000.03

    Tohoku University  

Committee Memberships

  • 2024.09
    -
    Now

    早稲田大学  人を対象とする研究に関する倫理審査委員会B 委員長

  • 2023.04
    -
    Now

    Japanese Society for Bioinformatics  Vice president

  • 2022.06
    -
    Now

    mRNA Targeted Drug Discovery Research Organization  board of directors

  • 2020.04
    -
    Now

    Japanese Society for Bioinformatics

  • 2021.09
    -
     

    2021年日本バイオインフォマティクス学会年会・第10回生命医薬情報学連合大会(IIBMP2021)  congress president

  • 2015.04
    -
    2017.03

    Japanese Society for Bioinformatics  Board member

▼display all

Research Areas

  • System genome science / Genome biology / Life, health and medical informatics / Intelligent informatics

Research Interests

  • RNA therapeutics

  • Artificial Intelligence

  • probabilistic model

  • RNA-protein interaction

  • RNA-RNA interaction

  • interactome

  • long noncoding RNA (lncRNA)

  • epi-transcriptome

  • epi-genome

  • RNA aptamer

  • sequece analysis

  • RNA

  • Bioinformatics

▼display all

Awards

  • Okuma Memorial Academic Award (Encouragement Award)

    2024.11   Waseda University  

    Winner: Michiaki Hamada

  • Waseda University Teaching Award

    2023.07   Waseda University   Bioinformatics

    Winner: Michiaki Hamada, Chao Zeng

  • 早稲田大学 次代の中核研究者 2022

    2022.04   早稲田大学  

    Winner: 浜田道昭

  • 早稲田大学リサーチアワード(国際研究発信力)

    2021.12   早稲田大学  

    Winner: 浜田道昭

  • 平成29年度科学技術分野の文部科学大臣表彰 若手科学者賞

    2017.04   文部科学省  

    Winner: 浜田道昭

  • 産業技術総合研究所 理事長賞(研究)

    2016.04   産業技術総合研究所  

    Winner: 浜田道昭

▼display all

 

Papers

  • Deep generative design of RNA family sequences

    Shunsuke Sumi, Michiaki Hamada, Hirohide Saito

    Nature Methods   21 ( 3 ) 435 - 443  2024.03  [Refereed]  [International journal]

    Authorship:Corresponding author

     View Summary

    RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.

    DOI PubMed

    Scopus

    7
    Citation
    (Scopus)
  • Landscape of semi-extractable RNAs across five human cell lines.

    Chao Zeng, Takeshi Chujo, Tetsuro Hirose, Michiaki Hamada

    Nucleic acids research   51 ( 15 ) 7820 - 7831  2023.07  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    Phase-separated membraneless organelles often contain RNAs that exhibit unusual semi-extractability using the conventional RNA extraction method, and can be efficiently retrieved by needle shearing or heating during RNA extraction. Semi-extractable RNAs are promising resources for understanding RNA-centric phase separation. However, limited assessments have been performed to systematically identify and characterize semi-extractable RNAs. In this study, 1074 semi-extractable RNAs, including ASAP1, DANT2, EXT1, FTX, IGF1R, LIMS1, NEAT1, PHF21A, PVT1, SCMH1, STRG.3024.1, TBL1X, TCF7L2, TVP23C-CDRT4, UBE2E2, ZCCHC7, ZFAND3 and ZSWIM6, which exhibited consistent semi-extractability were identified across five human cell lines. By integrating publicly available datasets, we found that semi-extractable RNAs tend to be distributed in the nuclear compartments but are dissociated from the chromatin. Long and repeat-containing semi-extractable RNAs act as hubs to provide global RNA-RNA interactions. Semi-extractable RNAs were divided into four groups based on their k-mer content. The NEAT1 group preferred to interact with paraspeckle proteins, such as FUS and NONO, implying that RNAs in this group are potential candidates of architectural RNAs that constitute nuclear bodies.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • PBSIM3: a simulator for all types of PacBio and ONT long reads.

    Yukiteru Ono, Michiaki Hamada, Kiyoshi Asai

    NAR genomics and bioinformatics   4 ( 4 ) lqac092  2022.12  [International journal]

     View Summary

    Long-read sequencers, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencers, have improved their read length and accuracy, thereby opening up unprecedented research. Many tools and algorithms have been developed to analyze long reads, and rapid progress in PacBio and ONT has further accelerated their development. Together with the development of high-throughput sequencing technologies and their analysis tools, many read simulators have been developed and effectively utilized. PBSIM is one of the popular long-read simulators. In this study, we developed PBSIM3 with three new functions: error models for long reads, multi-pass sequencing for high-fidelity read simulation and transcriptome sequencing simulation. Therefore, PBSIM3 is now able to meet a wide range of long-read simulation requirements.

    DOI PubMed

    Scopus

    20
    Citation
    (Scopus)
  • Generative aptamer discovery using RaptGen

    Natsuki Iwano, Tatsuo Adachi, Kazuteru Aoki, Yoshikazu Nakamura, Michiaki Hamada

    Nature Computational Science   2 ( 6 ) 378 - 386  2022.06  [Refereed]

    Authorship:Last author, Corresponding author

     View Summary

    Nucleic acid aptamers are generated by an in vitro molecular evolution method known as systematic evolution of ligands by exponential enrichment (SELEX). Various candidates are limited by actual sequencing data from an experiment. Here we developed RaptGen, which is a variational autoencoder for in silico aptamer generation. RaptGen exploits a profile hidden Markov model decoder to represent motif sequences effectively. We showed that RaptGen embedded simulation sequence data into low-dimensional latent space on the basis of motif information. We also performed sequence embedding using two independent SELEX datasets. RaptGen successfully generated aptamers from the latent space even though they were not included in high-throughput sequencing. RaptGen could also generate a truncated aptamer with a short learning model. We demonstrated that RaptGen could be applied to activity-guided aptamer generation according to Bayesian optimization. We concluded that a generative method by RaptGen and latent representation are useful for aptamer discovery.

    DOI

    Scopus

    33
    Citation
    (Scopus)
  • RaptRanker: in silico RNA aptamer selection from HT-SELEX experiment based on local sequence and structure information.

    Ryoga Ishida, Tatsuo Adachi, Aya Yokota, Hidehito Yoshihara, Kazuteru Aoki, Yoshikazu Nakamura, Michiaki Hamada

    Nucleic acids research   48 ( 14 ) e82  2020.08  [International journal]

     View Summary

    Aptamers are short single-stranded RNA/DNA molecules that bind to specific target molecules. Aptamers with high binding-affinity and target specificity are identified using an in vitro procedure called high throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). However, the development of aptamer affinity reagents takes a considerable amount of time and is costly because HT-SELEX produces a large dataset of candidate sequences, some of which have insufficient binding-affinity. Here, we present RNA aptamer Ranker (RaptRanker), a novel in silico method for identifying high binding-affinity aptamers from HT-SELEX data by scoring and ranking. RaptRanker analyzes HT-SELEX data by evaluating the nucleotide sequence and secondary structure simultaneously, and by ranking according to scores reflecting local structure and sequence frequencies. To evaluate the performance of RaptRanker, we performed two new HT-SELEX experiments, and evaluated binding affinities of a part of sequences that include aptamers with low binding-affinity. In both datasets, the performance of RaptRanker was superior to Frequency, Enrichment and MPBind. We also confirmed that the consideration of secondary structures is effective in HT-SELEX data analysis, and that RaptRanker successfully predicted the essential subsequence motifs in each identified sequence.

    DOI PubMed

    Scopus

    39
    Citation
    (Scopus)
  • Multi-objective computational optimization of human 5′ UTR sequences

    Keisuke Yamada, Kanta Suga, Naoko Abe, Koji Hashimoto, Susumu Tsutsumi, Masahito Inagaki, Fumitaka Hashiya, Hiroshi Abe, Michiaki Hamada

       2024.11

    DOI

  • The MTR4/hnRNPK complex surveils aberrant polyadenylated RNAs with multiple exons.

    Kenzui Taniue, Anzu Sugawara, Chao Zeng, Han Han, Xinyue Gao, Yuki Shimoura, Atsuko Nakanishi Ozeki, Rena Onoguchi-Mizutani, Masahide Seki, Yutaka Suzuki, Michiaki Hamada, Nobuyoshi Akimitsu

    Nature communications   15 ( 1 ) 8684 - 8684  2024.10  [Refereed]  [International journal]

     View Summary

    RNA surveillance systems degrade aberrant RNAs that result from defective transcriptional termination, splicing, and polyadenylation. Defective RNAs in the nucleus are recognized by RNA-binding proteins and MTR4, and are degraded by the RNA exosome complex. Here, we detect aberrant RNAs in MTR4-depleted cells using long-read direct RNA sequencing and 3' sequencing. MTR4 destabilizes intronic polyadenylated transcripts generated by transcriptional read-through over one or more exons, termed 3' eXtended Transcripts (3XTs). MTR4 also associates with hnRNPK, which recognizes 3XTs with multiple exons. Moreover, the aberrant protein translated from KCTD13 3XT is a target of the hnRNPK-MTR4-RNA exosome pathway and forms aberrant condensates, which we name KCTD13 3eXtended Transcript-derived protein (KeXT) bodies. Our results suggest that RNA surveillance in human cells inhibits the formation of condensates of a defective polyadenylated transcript-derived protein.

    DOI PubMed

    Scopus

  • Landscape of evolutionary arms races between transposable elements and KRAB-ZFP family.

    Masato Kosuge, Jumpei Ito, Michiaki Hamada

    Scientific reports   14 ( 1 ) 23358 - 23358  2024.10  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    Transposable elements (TEs) are mobile parasitic sequences that have expanded within the host genome. It has been hypothesized that host organisms have expanded the Krüppel-associated box-containing zinc finger proteins (KRAB-ZFPs), which epigenetically suppress TEs, to counteract disorderly TE transpositions. This process is referred to as the evolutionary arms race. However, the extent to which this evolutionary arms race occurred across various TE families remains unclear. In the present study, we systematically explored the evolutionary arms race between TE families and human KRAB-ZFPs using public ChIP-seq data. We discovered and characterized new instances of evolutionary arms races with KRAB-ZFPs in endogenous retroviruses. Furthermore, we found that the regulatory landscape shaped by this arms race contributed to the gene regulatory networks. In summary, our results provide insight into the impact of the evolutionary arms race on TE families, the KRAB-ZFP family, and host gene regulatory networks.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • A chimeric RNA consisting of siRNA and aptamer for inhibiting dengue virus replication

    Ryo Amano, Masaki Takahashi, Kazumi Haga, Mizuki Yamamoto, Kaku Goto, Akiko Ichinose, Michiaki Hamada, Jin Gohda, Jun-ichiro Inoue, Yasushi Kawaguchi, Meng Ling Moi, Yoshikazu Nakamura

    NAR Molecular Medicine    2024.10

    DOI

  • The Lomb-Scargle periodogram-based differentially expressed gene detection along pseudotime

    Hitoshi Iuchi, Michiaki Hamada

       2024.08  [Refereed]

    Authorship:Last author, Corresponding author

     View Summary

    Abstract

    Motivation

    In recent years, single-cell RNA sequencing (scRNA-seq) has provided high-resolution snapshots of biological processes and has contributed to the understanding of cell dynamics. Trajectory inference has the potential to provide a quantitative representation of cell dynamics, and several trajectory inference algorithms have been developed. However, the downstream analysis of trajectory inference, such as the analysis of differentially expressed genes (DEG), remains challenging.

    Results

    In this study, we introduce a Lomb-Scargle (LS) periodogram-based algorithm for identifying DEGs associated with pseudotime in a trajectory analysis. The algorithm is capable of analyzing any inferred trajectory, including tree structures with multiple branching points, leading to diverse cell types. We validated this approach using simulated data and real datasets, and our results showed that our approach was superior when performing DEG analysis on complex structured trajectories. Our approach will contribute to gene characterization in trajectory analysis and help gain deeper biological insights.

    Availability

    All code used in our proposed method can be found athttps://github.com/hiuchi/LS.

    Contact

    hitoshi.iuchi@hamadalab.com

    Supplementary information

    Supplementary data are available atJournal Nameonline.

    DOI

  • Identification of a novel RNA transcript TISPL upregulated by stressors that stimulate ATF4.

    Yutaro Wakabayashi, Aika Shimono, Yuki Terauchi, Chao Zeng, Michiaki Hamada, Kentaro Semba, Shinya Watanabe, Kosuke Ishikawa

    Gene   917   148464 - 148464  2024.07  [Refereed]  [International journal]

     View Summary

    Cells sense, respond, and adapt to environmental conditions that cause stress. In a previous study using HeLa cells, we isolated reporter cells responding to the endoplasmic reticulum (ER) stress inducers, thapsigargin and tunicamycin, using a highly sensitive promoter trap vector system. Splinkerette PCR and 5' rapid amplification of cDNA ends (5' RACE) identified a novel transcript that is upregulated by ER stress. Its endogenous expression increased approximately 10-fold in response to thapsigargin and tunicamycin within 1 h, but was down-regulated after 4 h. Because the transcript starts from an intron of a long noncoding RNA known as LINC-PINT, we designated the newly identified transcript TISPL (transcript induced by stressors from LINC-PINTlocus). TISPL was also expressed under several other stress conditions. It was particularly increased > 10-fold upon glucose starvation and 7-fold by arsenite exposure. Furthermore, in silico analyses, including a ChIP-atlas search, revealed that there is an ATF4-binding region with a c/ebp-Atf response element (CARE) downstream of the transcription start site of TISPL. Based on these results, we hypothesized that TISPL may be induced by the phospho-eIF2α and ATF4- axis of the integrated stress response pathway, which is known to be activated by the stress conditions listed above. As expected, knockout of ATF4 abolished the stress-induced upregulation of TISPL. Our results indicate that TISPL may be a useful biomarker for detecting stress conditions that activate ATF4. Our highly sensitive trap vector system proved beneficial in discovering new biomarkers.

    DOI PubMed

    Scopus

  • Inflammation primes the murine kidney for recovery by activating AZIN1 adenosine-to-inosine editing.

    Segewkal Hawaze Heruye, Jered Myslinski, Chao Zeng, Amy Zollman, Shinichi Makino, Azuma Nanamatsu, Quoseena Mir, Sarath Chandra Janga, Emma H Doud, Michael T Eadon, Bernhard Maier, Michiaki Hamada, Tuan M Tran, Pierre C Dagher, Takashi Hato

    The Journal of clinical investigation   134 ( 17 )  2024.07  [Refereed]  [International journal]

     View Summary

    The progression of kidney disease varies among individuals, but a general methodology to quantify disease timelines is lacking. Particularly challenging is the task of determining the potential for recovery from acute kidney injury following various insults. Here, we report that quantitation of post-transcriptional adenosine-to-inosine (A-to-I) RNA editing offers a distinct genome-wide signature, enabling the delineation of disease trajectories in the kidney. A well-defined murine model of endotoxemia permitted the identification of the origin and extent of A-to-I editing, along with temporally discrete signatures of double-stranded RNA stress and Adenosine Deaminase isoform switching. We found that A-to-I editing of Antizyme Inhibitor 1 (AZIN1), a positive regulator of polyamine biosynthesis, serves as a particularly useful temporal landmark during endotoxemia. Our data indicate that AZIN1 A-to-I editing, triggered by preceding inflammation, primes the kidney and activates endogenous recovery mechanisms. By comparing genetically modified human cell lines and mice locked in either A-to-I edited or uneditable states, we uncovered that AZIN1 A-to-I editing not only enhances polyamine biosynthesis but also engages glycolysis and nicotinamide biosynthesis to drive the recovery phenotype. Our findings implicate that quantifying AZIN1 A-to-I editing could potentially identify individuals who have transitioned to an endogenous recovery phase. This phase would reflect their past inflammation and indicate their potential for future recovery.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • Selection and characterization of aptamers targeting the Vif-CBFβ-ELOB-ELOC-CUL5 complex.

    Kazuyuki Kumagai, Keisuke Kamba, Takuya Suzuki, Yuto Sekikawa, Chisato Yuki, Michiaki Hamada, Kayoko Nagata, Akifumi Takaori-Kondo, Li Wan, Masato Katahira, Takashi Nagata, Taiichi Sakamoto

    Journal of biochemistry   176 ( 3 ) 205 - 215  2024.05  [Refereed]  [International journal]

     View Summary

    The viral infectivity factor (Vif) of human immunodeficiency virus 1 forms a complex with host proteins, designated as Vif-CBFβ-ELOB-ELOC-CUL5 (VβBCC), initiating the ubiquitination and subsequent proteasomal degradation of the human antiviral protein APOBEC3G (A3G), thereby negating its antiviral function. While recent cryo-electron microscopy (cryo-EM) studies have implicated RNA molecules in the Vif-A3G interaction that leads to A3G ubiquitination, our findings indicated that the VβBCC complex can also directly impede A3G-mediated DNA deamination, bypassing the proteasomal degradation pathway. Employing the Systematic Evolution of Ligands by EXponential enrichment (SELEX) method, we have identified RNA aptamers with high affinity for the VβBCC complex. These aptamers not only bind to the VβBCC complex but also reinstate A3G's DNA deamination activity by inhibiting the complex's function. Moreover, we delineated the sequences and secondary structures of these aptamers, providing insights into the mechanistic aspects of A3G inhibition by the VβBCC complex. Analysis using selected aptamers will enhance our understanding of the inhibition of A3G by the VβBCC complex, offering potential avenues for therapeutic intervention.

    DOI PubMed

    Scopus

  • Prediction of antibiotic resistance mechanisms using a protein language model

    Yagimoto K, Hosoda S, Sato M, Hamada M

    Bioinformatics (Oxford, England)   40 ( 10 )  2024.05  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    MOTIVATION: Antibiotic resistance has emerged as a major global health threat, with an increasing number of bacterial infections becoming difficult to treat. Predicting the underlying resistance mechanisms of antibiotic resistance genes (ARGs) is crucial for understanding and combating this problem. However, existing methods struggle to accurately predict resistance mechanisms for ARGs with low similarity to known sequences and lack sufficient interpretability of the prediction models. RESULTS: In this study, we present a novel approach for predicting ARG resistance mechanisms using ProteinBERT, a protein language model (pLM) based on deep learning. Our method outperforms state-of-the-art techniques on diverse ARG datasets, including those with low homology to the training data, highlighting its potential for predicting the resistance mechanisms of unknown ARGs. Attention analysis of the model reveals that it considers biologically relevant features, such as conserved amino acid residues and antibiotic target binding sites, when making predictions. These findings provide valuable insights into the molecular basis of antibiotic resistance and demonstrate the interpretability of pLMs, offering a new perspective on their application in bioinformatics. AVAILABILITY AND IMPLEMENTATION: The source code is available for free at https://github.com/hmdlab/ARG-BERT. The output results of the model are published at https://waseda.box.com/v/ARG-BERT-suppl.

    DOI PubMed

  • Hidden Challenges in Evaluating Spillover Risk of Zoonotic Viruses using Machine Learning Models

    Junna Kawasaki, Tadaki Suzuki, Michiaki Hamada

       2024.04

    DOI

  • RaptGen-Assisted Generation of an RNA/DNA Hybrid Aptamer against SARS-CoV-2 Spike Protein.

    Tatsuo Adachi, Shigetaka Nakamura, Akiya Michishita, Daiki Kawahara, Mizuki Yamamoto, Michiaki Hamada, Yoshikazu Nakamura

    Biochemistry   63 ( 7 ) 906 - 912  2024.03  [Refereed]  [International journal]

     View Summary

    Optimization of aptamers in length and chemistry is crucial for industrial applications. Here, we developed aptamers against the SARS-CoV-2 spike protein and achieved optimization with a deep-learning-based algorithm, RaptGen. We conducted a primer-less SELEX against the receptor binding domain (RBD) of the spike with an RNA/DNA hybrid library, and the resulting sequences were subjected to RaptGen analysis. Based on the sequence profiling by RaptGen, a short truncation aptamer of 26 nucleotides was obtained and further optimized by a chemical modification of relevant nucleotides. The resulting aptamer is bound to RBD not only of SARS-CoV-2 wildtype but also of its variants, SARS-CoV-1, and Middle East respiratory syndrome coronavirus (MERS-CoV). We concluded that the RaptGen-assisted discovery is efficient for developing optimized aptamers.

    DOI PubMed

    Scopus

  • DeepRaccess: high-speed RNA accessibility prediction using deep learning

    Kaisei Hara, Natsuki Iwano, Tsukasa Fukunaga, Michiaki Hamada

    Frontiers in Bioinformatics   3   1275787 - 1275787  2023.10  [Refereed]  [International journal]

    Authorship:Corresponding author

     View Summary

    RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • Neat1 lncRNA organizes the inflammatory gene expressions in the dorsal root ganglion in neuropathic pain caused by nerve injury

    Motoyo Maruyama, Atsushi Sakai, Tsukasa Fukunaga, Yoshitaka Miyagawa, Takashi Okada, Michiaki Hamada, Hidenori Suzuki

    Frontiers in Immunology   14   1185322 - 1185322  2023.08  [Refereed]  [International journal]

     View Summary

    Primary sensory neurons regulate inflammatory processes in innervated regions through neuro-immune communication. However, how their immune-modulating functions are regulated in concert remains largely unknown. Here, we show that Neat1 long non-coding RNA (lncRNA) organizes the proinflammatory gene expressions in the dorsal root ganglion (DRG) in chronic intractable neuropathic pain in rats. Neat1 was abundantly expressed in the DRG and was upregulated after peripheral nerve injury. Neat1 overexpression in primary sensory neurons caused mechanical and thermal hypersensitivity, whereas its knockdown alleviated neuropathic pain. Bioinformatics analysis of comprehensive transcriptome changes indicated the inflammatory response was the most relevant function of genes upregulated through Neat1. Consistent with this, upregulation of proinflammatory genes in the DRG following nerve injury was suppressed by Neat1 knockdown. Expression changes of these proinflammatory genes were regulated through Neat1-mRNA interaction-dependent and -independent mechanisms. Notably, Neat1 increased proinflammatory genes by stabilizing its interacting mRNAs in neuropathic pain. Finally, Neat1 in primary sensory neurons contributed to spinal inflammatory processes that mediated peripheral neuropathic pain. These findings demonstrate that Neat1 lncRNA is a key regulator of neuro-immune communication in neuropathic pain.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • Transposons contribute to the acquisition of cell type-specific cis-elements in the brain

    Kotaro Sekine, Masahiro Onoguchi, Michiaki Hamada

    Communications Biology   6 ( 1 ) 631 - 631  2023.06  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    Abstract

    Mammalian brains have evolved in stages over a long history to acquire higher functions. Recently, several transposable element (TE) families have been shown to evolve into cis-regulatory elements of brain-specific genes. However, it is not fully understood how TEs are important for gene regulatory networks. Here, we performed a single-cell level analysis using public data of scATAC-seq to discover TE-derived cis-elements that are important for specific cell types. Our results suggest that DNA elements derived from TEs, MER130 and MamRep434, can function as transcription factor-binding sites based on their internal motifs for Neurod2 and Lhx2, respectively, especially in glutamatergic neuronal progenitors. Furthermore, MER130- and MamRep434-derived cis-elements were amplified in the ancestors of Amniota and Eutheria, respectively. These results suggest that the acquisition of cis-elements with TEs occurred in different stages during evolution and may contribute to the acquisition of different functions or morphologies in the brain.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery

    Kengo Sato, Michiaki Hamada

    Briefings in Bioinformatics   24 ( 4 )  2023.05  [Refereed]  [International journal]

     View Summary

    Abstract

    Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA–protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA–small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.

    DOI PubMed

    Scopus

    17
    Citation
    (Scopus)
  • Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk.

    Shohei Kojima, Satoshi Koyama, Mirei Ka, Yuka Saito, Erica H Parrish, Mikiko Endo, Sadaaki Takata, Misaki Mizukoshi, Keiko Hikino, Atsushi Takeda, Asami F Gelinas, Steven M Heaton, Rie Koide, Anselmo J Kamada, Michiya Noguchi, Michiaki Hamada, Yoichiro Kamatani, Yasuhiro Murakawa, Kazuyoshi Ishigaki, Yukio Nakamura, Kaoru Ito, Chikashi Terao, Yukihide Momozawa, Nicholas F Parrish

    Nature genetics   55 ( 6 ) 939 - 951  2023.05  [Refereed]  [International journal]

     View Summary

    Mobile genetic elements (MEs) are heritable mutagens that recursively generate structural variants (SVs). ME variants (MEVs) are difficult to genotype and integrate in statistical genetics, obscuring their impact on genome diversification and traits. We developed a tool that accurately genotypes MEVs using short-read whole-genome sequencing (WGS) and applied it to global human populations. We find unexpected population-specific MEV differences, including an Alu insertion distribution distinguishing Japanese from other populations. Integrating MEVs with expression quantitative trait loci (eQTL) maps shows that MEV classes regulate tissue-specific gene expression by shared mechanisms, including creating or attenuating enhancers and recruiting post-transcriptional regulators, supporting class-wide interpretability. MEVs more often associate with gene expression changes than SNVs, thus plausibly impacting traits. Performing genome-wide association study (GWAS) with MEVs pinpoints potential causes of disease risk, including a LINE-1 insertion associated with keloid and fasciitis. This work implicates MEVs as drivers of human divergence and disease risk.

    DOI PubMed

    Scopus

    17
    Citation
    (Scopus)
  • Bioinformatics approaches for unveiling virus-host interactions.

    Hitoshi Iuchi, Junna Kawasaki, Kento Kubo, Tsukasa Fukunaga, Koki Hokao, Gentaro Yokoyama, Akiko Ichinose, Kanta Suga, Michiaki Hamada

    Computational and structural biotechnology journal   21   1774 - 1784  2023  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.

    DOI PubMed

    Scopus

    9
    Citation
    (Scopus)
  • Web Services for RNA-RNA Interaction Prediction

    Tsukasa Fukunaga, Junichi Iwakiri, Michiaki Hamada

    Methods in Molecular Biology   2586   175 - 195  2023  [Refereed]  [International journal]

    Authorship:Last author

     View Summary

    Non-coding RNAs have various biological functions such as translational regulation, and RNA-RNA interactions play essential roles in the mechanisms of action of these RNAs. Therefore, RNA-RNA interaction prediction is an important problem in bioinformatics, and many tools have been developed for the computational prediction of RNA-RNA interactions. In addition to the development of novel algorithms with high accuracy, the development and maintenance of web services is essential for enhancing usability by experimental biologists. In this review, we survey web services for RNA-RNA interaction predictions and introduce how to use primary web services. We present various prediction tools, including general interaction prediction tools, prediction tools for specific RNA classes, and RNA-RNA interaction-based RNA design tools. Additionally, we discuss the future perspectives of the development of RNA-RNA interaction prediction tools and the sustainability of web services.

    DOI PubMed

    Scopus

  • Structure-based screening for functional non-coding RNAs in fission yeast identifies a factor repressing untimely initiation of sexual differentiation.

    Yu Ono, Kenta Katayama, Tomoki Onuma, Kento Kubo, Hayato Tsuyuzaki, Michiaki Hamada, Masamitsu Sato

    Nucleic acids research   50 ( 19 ) 11229 - 11242  2022.10  [Refereed]  [International journal]

    Authorship:Corresponding author

     View Summary

    Non-coding RNAs (ncRNAs) ubiquitously exist in normal and cancer cells. Despite their prevalent distribution, the functions of most long ncRNAs remain uncharacterized. The fission yeast Schizosaccharomyces pombe expresses >1800 ncRNAs annotated to date, but most unconventional ncRNAs (excluding tRNA, rRNA, snRNA and snoRNA) remain uncharacterized. To discover the functional ncRNAs, here we performed a combinatory screening of computational and biological tests. First, all S. pombe ncRNAs were screened in silico for those showing conservation in sequence as well as in secondary structure with ncRNAs in closely related species. Almost a half of the 151 selected conserved ncRNA genes were uncharacterized. Twelve ncRNA genes that did not overlap with protein-coding sequences were next chosen for biological screening that examines defects in growth or sexual differentiation, as well as sensitivities to drugs and stresses. Finally, we highlighted an ncRNA transcribed from SPNCRNA.1669, which inhibited untimely initiation of sexual differentiation. A domain that was predicted as conserved secondary structure by the computational operations was essential for the ncRNA to function. Thus, this study demonstrates that in silico selection focusing on conservation of the secondary structure over species is a powerful method to pinpoint novel functional ncRNAs.

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • Mobile elements in human population-specific genome and phenotype divergence

    Shohei Kojima, Satoshi Koyama, Mirei Ka, Yuka Saito, Erica H. Parrish, Mikiko Endo, Sadaaki Takata, Misaki Mizukoshi, Keiko Hikino, Atsushi Takeda, Asami F. Gelinas, Steven M. Heaton, Rie Koide, Anselmo J. Kamada, Michiya Noguchi, Michiaki Hamada, Yoichiro Kamatani, Yasuhiro Murakawa, Kazuyoshi Ishigaki, Yukio Nakamura, Kaoru Ito, Chikashi Terao, Yukihide Momozawa, Nicholas F. Parrish

       2022.03

     View Summary

    Abstract

    Mobile genetic elements (MEs) are heritable mutagens that contribute to divergence between lineages by recursively generating structural variants. ME variants (MEVs) are difficult to genotype, obscuring their impact on recent genome and trait diversification. We developed a tool that uses short-read sequence data to accurately genotype MEVs, enabling us to study them using statistical genetics methods in global human genomes. We observe population-specific differences in the distribution of Alu insertions that distinguish Japanese from other populations. We integrated MEVs with epigenomic and expression quantitative trait loci (eQTL) maps to determine how they impact traits. This reveals coherent patterns by which specific MEs regulate tissue-specific gene expression, including creating or attenuating enhancers and recruiting post-transcriptional regulators. We pinpoint MEVs as genetic causes of disease risk, including a LINE-1 insertion linked to keloid and other diseases of fibroblast inflammation, by introducing MEVs into the genome-wide association study (GWAS) framework. In addition to nominating previously-hidden MEVs as causes of human diseases, this work highlights MEs as accelerators of human population divergence and begins to decipher the semantics of MEs.

    DOI

  • Probiotic responder identification in cross-over trials for constipation using a Bayesian statistical model considering lags between intake and effect periods

    Shion Hosoda, Yuichiro Nishimoto, Yohsuke Yamauchi, Takuji Yamada, Michiaki Hamada

    Computational and Structural Biotechnology Journal   21   5350 - 5357  2022.03  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    Recent advances in microbiome research have led to the further development of microbial interventions, such as probiotics and prebiotics, which are potential treatments for constipation. However, the effects of probiotics vary from person to person; therefore, the effectiveness of probiotics needs to be verified for each individual. Individuals showing significant effects of the target probiotic are called responders. A statistical model for the evaluation of responders was proposed in a previous study. However, the previous model does not consider the lag between intake and effect periods of the probiotic. It is expected that the lag exists when probiotics are administered and when they are effective. In this study, we propose a Bayesian statistical model to estimate the probability that a subject is a responder, by considering the lag between intake and effect periods. In synthetic dataset experiments, the proposed model was found to outperform the base model, which did not factor in the lag. Further, we found that the proposed model could distinguish responders showing large uncertainty in terms of the lag between intake and effect periods.

    DOI PubMed

  • G0S2 regulates innate immunity in Kawasaki disease via lncRNA HSD11B1-AS1.

    Mako Okabe, Shinya Takarada, Nariaki Miyao, Hideyuki Nakaoka, Keijiro Ibuki, Sayaka Ozawa, Kazuhiro Watanabe, Harue Tsuji, Ikuo Hashimoto, Kiyoshi Hatasaki, Shotaro Hayakawa, Yu Hamaguchi, Michiaki Hamada, Fukiko Ichida, Keiichi Hirono

    Pediatric research   92 ( 2 ) 378 - 387  2022.03  [Refereed]  [International journal]

     View Summary

    BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that is currently the most common cause of acquired heart disease in children. However, its etiology remains unknown. Long non-coding RNAs (lncRNAs) contribute to the pathophysiology of various diseases. Few studies have reported the role of lncRNAs in KD inflammation; thus, we investigated the role of lncRNA in KD inflammation. METHODS: A total of 50 patients with KD (median age, 19 months; 29 males and 21 females) were enrolled. We conducted cap analysis gene expression sequencing to determine differentially expressed genes in monocytes of the peripheral blood of the subjects. RESULTS: About 21 candidate lncRNA transcripts were identified. The analyses of transcriptome and gene ontology revealed that the immune system was involved in KD. Among these genes, G0/G1 switch gene 2 (G0S2) and its antisense lncRNA, HSD11B1-AS1, were upregulated during the acute phase of KD (P < 0.0001 and <0.0001, respectively). Moreover, G0S2 increased when lipopolysaccharides induced inflammation in THP-1 monocytes, and silencing of G0S2 suppressed the expression of HSD11B1-AS1 and tumor necrosis factor-α. CONCLUSIONS: This study uncovered the crucial role of lncRNAs in innate immunity in acute KD. LncRNA may be a novel target for the diagnosis of KD. IMPACT: This study revealed the whole aspect of the gene expression profile of monocytes of patients with Kawasaki disease (KD) using cap analysis gene expression sequencing and identified KD-specific molecules: G0/G1 switch gene 2 (G0S2) and long non-coding RNA (lncRNA) HSD11B1-AS1. We demonstrated that G0S2 and its antisense HSD11B1-AS1 were associated with inflammation of innate immunity in KD. lncRNA may be a novel key target for the diagnosis of patients with KD.

    DOI PubMed

    Scopus

    8
    Citation
    (Scopus)
  • HT-SELEX-based identification of binding pre-miRNA hairpin-motif for small molecules.

    Sanjukta Mukherjee, Asako Murata, Ryoga Ishida, Ayako Sugai, Chikara Dohno, Michiaki Hamada, Sudhir Krishna, Kazuhiko Nakatani

    Molecular therapy. Nucleic acids   27   165 - 174  2022.03  [Refereed]  [International journal]

     View Summary

    Selective targeting of biologically relevant RNAs with small molecules is a long-standing challenge due to the lack of clear understanding of the binding RNA motifs for small molecules. The standard SELEX procedure allows the identification of specific RNA binders (aptamers) for the target of interest. However, more effort is needed to identify and characterize the sequence-structure motifs in the aptamers important for binding to the target. Herein, we described a strategy integrating high-throughput (HT) sequencing with conventional SELEX followed by bioinformatic analysis to identify aptamers with high binding affinity and target specificity to unravel the sequence-structure motifs of pre-miRNA, which is essential for binding to the recently developed new water-soluble small-molecule CMBL3aL. To confirm the fidelity of this approach, we investigated the binding of CMBL3aL to the identified motifs by surface plasmon resonance (SPR) spectroscopy and its potential regulatory activity on dicer-mediated cleavage of the obtained aptamers and endogenous pre-miRNAs comprising the identified motif in its hairpin loop. This new approach would significantly accelerate the identification process of binding sequence-structure motifs of pre-miRNA for the compound of interest and would contribute to increase the spectrum of biomedical application.

    DOI PubMed

    Scopus

    6
    Citation
    (Scopus)
  • Prediction of RNA-protein interactions using a nucleotide language model

    Keisuke Yamada, Michiaki Hamada

    Bioinformatics Advances   2 ( 1 ) vbac023  2022  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

     View Summary

    MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations. RESULTS: Here, we propose BERT-RBP as a model to predict RNA-RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems. AVAILABILITY AND IMPLEMENTATION: Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.

    DOI PubMed

    Scopus

    25
    Citation
    (Scopus)
  • LinAliFold and CentroidLinAliFold: fast RNA consensus secondary structure prediction for aligned sequences using beam search methods

    Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics Advances   2 ( 1 ) vbac078  2022.01  [International journal]

     View Summary

    Abstract

    Motivation

    RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time.

    Results

    In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results.

    Availability and implementation

    The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold.

    Supplementary information

    Supplementary data are available at Bioinformatics Advances online.

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs.

    Chao Zeng, Atsushi Takeda, Kotaro Sekine, Naoki Osato, Tsukasa Fukunaga, Michiaki Hamada

    Methods in molecular biology (Clifton, N.J.)   2509   315 - 340  2022  [International journal]

     View Summary

    With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • Clone decomposition based on mutation signatures provides novel insights into mutational processes.

    Taro Matsutani, Michiaki Hamada

    NAR genomics and bioinformatics   3 ( 4 ) lqab093  2021.12  [International journal]

     View Summary

    Intra-tumor heterogeneity is a phenomenon in which mutation profiles differ from cell to cell within the same tumor and is observed in almost all tumors. Understanding intra-tumor heterogeneity is essential from the clinical perspective. Numerous methods have been developed to predict this phenomenon based on variant allele frequency. Among the methods, CloneSig models the variant allele frequency and mutation signatures simultaneously and provides an accurate clone decomposition. However, this method has limitations in terms of clone number selection and modeling. We propose SigTracer, a novel hierarchical Bayesian approach for analyzing intra-tumor heterogeneity based on mutation signatures to tackle these issues. We show that SigTracer predicts more reasonable clone decompositions than the existing methods against artificial data that mimic cancer genomes. We applied SigTracer to whole-genome sequences of blood cancer samples. The results were consistent with past findings that single base substitutions caused by a specific signature (previously reported as SBS9) related to the activation-induced cytidine deaminase intensively lie within immunoglobulin-coding regions for chronic lymphocytic leukemia samples. Furthermore, we showed that this signature mutates regions responsible for cell-cell adhesion. Accurate assignments of mutations to signatures by SigTracer can provide novel insights into signature origins and mutational processes.

    DOI PubMed

    Scopus

  • Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization.

    Shitao Zhao, Michiaki Hamada

    BMC bioinformatics   22 ( 1 ) 554 - 554  2021.11  [International journal]

     View Summary

    BACKGROUND: Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. RESULTS: Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. CONCLUSIONS: Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.

    DOI PubMed

    Scopus

    4
    Citation
    (Scopus)
  • Impact of human gene annotations on RNA-seq differential expression analysis.

    Yu Hamaguchi, Chao Zeng, Michiaki Hamada

    BMC genomics   22 ( 1 ) 730 - 730  2021.10  [International journal]

     View Summary

    BACKGROUND: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. RESULTS: Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. CONCLUSIONS: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

    DOI PubMed

    Scopus

    7
    Citation
    (Scopus)
  • Binding patterns of RNA-binding proteins to repeat-derived RNA sequences reveal putative functional RNA elements.

    Masahiro Onoguchi, Chao Zeng, Ayako Matsumaru, Michiaki Hamada

    NAR genomics and bioinformatics   3 ( 3 ) lqab055  2021.09  [International journal]

     View Summary

    Recent reports have revealed that repeat-derived sequences embedded in introns or long noncoding RNAs (lncRNAs) are targets of RNA-binding proteins (RBPs) and contribute to biological processes such as RNA splicing or transcriptional regulation. These findings suggest that repeat-derived RNAs are important as scaffolds of RBPs and functional elements. However, the overall functional sequences of the repeat-derived RNAs are not fully understood. Here, we show the putative functional repeat-derived RNAs by analyzing the binding patterns of RBPs based on ENCODE eCLIP data. We mapped all eCLIP reads to repeat sequences and observed that 10.75 % and 7.04 % of reads on average were enriched (at least 2-fold over control) in the repeats in K562 and HepG2 cells, respectively. Using these data, we predicted functional RNA elements on the sense and antisense strands of long interspersed element 1 (LINE1) sequences. Furthermore, we found several new sets of RBPs on fragments derived from other transposable element (TE) families. Some of these fragments show specific and stable secondary structures and are found to be inserted into the introns of genes or lncRNAs. These results suggest that the repeat-derived RNA sequences are strong candidates for the functional RNA elements of endogenous noncoding RNAs.

    DOI PubMed

    Scopus

    4
    Citation
    (Scopus)
  • Umibato: estimation of time-varying microbial interaction using continuous-time regression hidden Markov model.

    Shion Hosoda, Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics (Oxford, England)   37 ( Suppl_1 ) i16-i24  2021.07  [International journal]

     View Summary

    MOTIVATION: Accumulating evidence has highlighted the importance of microbial interaction networks. Methods have been developed for estimating microbial interaction networks, of which the generalized Lotka-Volterra equation (gLVE)-based method can estimate a directed interaction network. The previous gLVE-based method for estimating microbial interaction networks did not consider time-varying interactions. RESULTS: In this study, we developed unsupervised learning-based microbial interaction inference method using Bayesian estimation (Umibato), a method for estimating time-varying microbial interactions. The Umibato algorithm comprises Gaussian process regression (GPR) and a new Bayesian probabilistic model, the continuous-time regression hidden Markov model (CTRHMM). Growth rates are estimated by GPR, and interaction networks are estimated by CTRHMM. CTRHMM can estimate time-varying interaction networks using interaction states, which are defined as hidden variables. Umibato outperformed the existing methods on synthetic datasets. In addition, it yielded reasonable estimations in experiments on a mouse gut microbiota dataset, thus providing novel insights into the relationship between consumed diets and the gut microbiota. AVAILABILITY AND IMPLEMENTATION: The C++ and python source codes of the Umibato software are available at https://github.com/shion-h/Umibato. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed

    Scopus

    3
    Citation
    (Scopus)
  • Possible roles for the hominoid-specific DSCR4 gene in human cells.

    Morteza M Saber, Marziyeh Karimiavargani, Takanori Uzawa, Nilmini Hettiarachchi, Michiaki Hamada, Yoshihiro Ito, Naruya Saitou

    Genes & genetic systems   96 ( 1 ) 1 - 11  2021.05  [Domestic journal]

     View Summary

    Down syndrome in humans is caused by trisomy of chromosome 21. DSCR4 (Down syndrome critical region 4) is a de novo-originated protein-coding gene present only in human chromosome 21 and its homologous chromosomes in apes. Despite being located in a medically critical genomic region and an abundance of evidence indicating its functionality, the roles of DSCR4 in human cells are unknown. We used a bioinformatic approach to infer the biological importance and cellular roles of this gene. Our analysis indicates that DSCR4 is likely involved in the regulation of interconnected biological pathways related to cell migration, coagulation and the immune system. We also showed that these predicted biological functions are consistent with tissue-specific expression of DSCR4 in migratory immune system leukocyte cells and neural crest cells (NCCs) that shape facial morphology in the human embryo. The immune system and NCCs are known to be affected in Down syndrome individuals, who suffer from DSCR4 misregulation, which further supports our findings. Providing evidence for the critical roles of DSCR4 in human cells, our findings establish the basis for further experimental investigations that will be necessary to confirm the roles of DSCR4 in the etiology of Down syndrome.

    DOI PubMed

    Scopus

    4
    Citation
    (Scopus)
  • PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.

    Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

    Bioinformatics (Oxford, England)   37 ( 5 ) 589 - 595  2021.05  [International journal]

     View Summary

    MOTIVATION: Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. RESULTS: To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. AVAILABILITY AND IMPLEMENTATION: The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed

    Scopus

    75
    Citation
    (Scopus)
  • Long Non-Coding RNA CRNDE Is Involved in Resistance to EGFR Tyrosine Kinase Inhibitor in EGFR-Mutant Lung Cancer via eIF4A3/MUC1/EGFR Signaling.

    Satoshi Takahashi, Rintaro Noro, Masahiro Seike, Chao Zeng, Masaru Matsumoto, Akiko Yoshikawa, Shinji Nakamichi, Teppei Sugano, Mariko Hirao, Kuniko Matsuda, Michiaki Hamada, Akihiko Gemma

    International journal of molecular sciences   22 ( 8 )  2021.04  [International journal]

     View Summary

    (1) Background: Acquired resistance to epidermal growth factor receptor-tyrosine kinase inhibitors (EGFR-TKIs) is an intractable problem for many clinical oncologists. The mechanisms of resistance to EGFR-TKIs are complex. Long non-coding RNAs (lncRNAs) may play an important role in cancer development and metastasis. However, the biological process between lncRNAs and drug resistance to EGFR-mutated lung cancer remains largely unknown. (2) Methods: Osimertinib- and afatinib-resistant EGFR-mutated lung cancer cells were established using a stepwise method. A microarray analysis of non-coding and coding RNAs was performed using parental and resistant EGFR-mutant non-small cell lung cancer (NSCLC) cells and evaluated by bioinformatics analysis through medical-industrial collaboration. (3) Results: Colorectal neoplasia differentially expressed (CRNDE) and DiGeorge syndrome critical region gene 5 (DGCR5) lncRNAs were highly expressed in EGFR-TKI-resistant cells by microarray analysis. RNA-protein binding analysis revealed eukaryotic translation initiation factor 4A3 (eIF4A3) bound in an overlapping manner to CRNDE and DGCR5. The CRNDE downregulates the expression of eIF4A3, mucin 1 (MUC1), and phospho-EGFR. Inhibition of CRNDE activated the eIF4A3/MUC1/EGFR signaling pathway and apoptotic activity, and restored sensitivity to EGFR-TKIs. (4) Conclusions: The results showed that CRNDE is associated with the development of resistance to EGFR-TKIs. CRNDE may be a novel therapeutic target to conquer EGFR-mutant NSCLC.

    DOI PubMed

    Scopus

    27
    Citation
    (Scopus)
  • Jonckheere-Terpstra-Kendall-based non-parametric analysis of temporal differential gene expression.

    Hitoshi Iuchi, Michiaki Hamada

    NAR genomics and bioinformatics   3 ( 1 ) lqab021  2021.03  [International journal]

     View Summary

    Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere-Terpstra-Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • RaptGen: A variational autoencoder with profile hidden Markov model for generative aptamer discovery

    Natsuki Iwano, Tatsuo Adachi, Kazuteru Aoki, Yoshikazu Nakamura, Michiaki Hamada

       2021.02

    DOI

  • Association analysis of repetitive elements and R-loop formation across species.

    Chao Zeng, Masahiro Onoguchi, Michiaki Hamada

    Mobile DNA   12 ( 1 ) 3 - 3  2021.01  [International journal]

     View Summary

    BACKGROUND: Although recent studies have revealed the genome-wide distribution of R-loops, our understanding of R-loop formation is still limited. Genomes are known to have a large number of repetitive elements. Emerging evidence suggests that these sequences may play an important regulatory role. However, few studies have investigated the effect of repetitive elements on R-loop formation. RESULTS: We found different repetitive elements related to R-loop formation in various species. By controlling length and genomic distributions, we observed that satellite, long interspersed nuclear elements (LINEs), and DNA transposons were each specifically enriched for R-loops in humans, fruit flies, and Arabidopsis thaliana, respectively. R-loops also tended to arise in regions of low-complexity or simple repeats across species. We also found that the repetitive elements associated with R-loop formation differ according to developmental stage. For instance, LINEs and long terminal repeat retrotransposons (LTRs) are more likely to contain R-loops in embryos (fruit fly) and then turn out to be low-complexity and simple repeats in post-developmental S2 cells. CONCLUSIONS: Our results indicate that repetitive elements may have species-specific or development-specific regulatory effects on R-loop formation. This work advances our understanding of repetitive elements and R-loop biology.

    DOI PubMed

    Scopus

    14
    Citation
    (Scopus)
  • Representation learning applications in biological sequence analysis.

    Hitoshi Iuchi, Taro Matsutani, Keisuke Yamada, Natsuki Iwano, Shunsuke Sumi, Shion Hosoda, Shitao Zhao, Tsukasa Fukunaga, Michiaki Hamada

    Computational and structural biotechnology journal   19   3198 - 3208  2021  [International journal]

     View Summary

    Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

    DOI PubMed

    Scopus

    48
    Citation
    (Scopus)
  • Corrigendum: Possible roles for the hominoid-specific DSCR4 gene in human cells [Genes Genet. Syst. (2021) 96, p. 1-11].

    Morteza M Saber, Marziyeh Karimiavargani, Takanori Uzawa, Nilmini Hettiarachchi, Michiaki Hamada, Yoshihiro Ito, Naruya Saitou

    Genes & genetic systems   96 ( 2 ) 105 - 105  2021  [Domestic journal]

     View Summary

    Legends to Figures 4 and 5 (p. 7) should be exchanged. Below are the correct legends to Figure 4 and Figure 5. Fig. 4. Interconnection of DSCR4 overexpression-mediated perturbed pathways. KEGG analysis of DSCR4 overexpression-mediated DEGs shows enrichment for the tightly interconnected pathways of the coagulation cascade and the complement cascade (highlighted in red) and further confirm the connection of these cascades with cell adhesion, migration and proliferation (red circle). Fig. 5. Expression profile of DSCR4 across human cell lines and tissues. According to Roadmap Epigenomics Project data, DSCR4 and DSCR8, which share a bidirectional promoter, are highly expressed only in K562 cells, a type of leukemia cell. Analysis of transcriptome data provided by Prescott et al. (2015) showed that DSCR4 and DSCR8 also display high expression in human and chimpanzee neural crest cells, which are critical migratory cells involved in facial morphogenesis in the embryo. (1) Data from Prescott et al. (2015). (2) Samples also include esophagus, lung, spleen and fetal large intestine. (3) Samples also include brain germinal matrix, hippocampus, fetal small intestine, stomach, left ventricle, small intestine, sigmoid colon, HEPG2 cells and HMEC cells. The PDF file for DOI: https://doi.org/10.1266/ggs.20-00012 has been replaced with the corrected version as of June 17, 2021.

    DOI PubMed

    Scopus

  • Identification of m6A-Associated RNA Binding Proteins Using an Integrative Computational Framework.

    Yiqian Zhang, Michiaki Hamada

    Frontiers in genetics   12   625797 - 625797  2021  [International journal]

     View Summary

    N6-methyladenosine (m6A) is an abundant modification on mRNA that plays an important role in regulating essential RNA activities. Several wet lab studies have identified some RNA binding proteins (RBPs) that are related to m6A's regulation. The objective of this study was to identify potential m6A-associated RBPs using an integrative computational framework. The framework was composed of an enrichment analysis and a classification model. Utilizing RBPs' binding data, we analyzed reproducible m6A regions from independent studies using this framework. The enrichment analysis identified known m6A-associated RBPs including YTH domain-containing proteins; it also identified RBM3 as a potential m6A-associated RBP for mouse. Furthermore, a significant correlation for the identified m6A-associated RBPs is observed at the protein expression level rather than the gene expression level. On the other hand, a Random Forest classification model was built for the reproducible m6A regions using RBPs' binding data. The RBP-based predictor demonstrated not only competitive performance when compared with sequence-based predictions but also reflected m6A's action of repelling against RBPs, which suggested that our framework can infer interaction between m6A and m6A-associated RBPs beyond sequence level when utilizing RBPs' binding data. In conclusion, we designed an integrative computational framework for the identification of known and potential m6A-associated RBPs. We hope the analysis will provide more insights on the studies of m6A and RNA modifications.

    DOI PubMed

    Scopus

    7
    Citation
    (Scopus)
  • Detection and Characterization of Ribosome-Associated Long Noncoding RNAs.

    Chao Zeng, Michiaki Hamada

    Methods in molecular biology (Clifton, N.J.)   2254   179 - 194  2021  [International journal]

     View Summary

    Ribosome profiling shows potential for studying the function of long noncoding RNAs (lncRNAs). We introduce a bioinformatics pipeline for detecting ribosome-associated lncRNAs (ribo-lncRNAs) from ribosome profiling data. Further, we describe a machine-learning approach for the characterization of ribo-lncRNAs based on their sequence features. Scripts for ribo-lncRNA analysis can be accessed at ( https://ribolnc.hamadalab.com/ ).

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • Parallelized Latent Dirichlet Allocation Provides a Novel Interpretability of Mutation Signatures in Cancer Genomes.

    Taro Matsutani, Michiaki Hamada

    Genes   11 ( 10 )  2020.09  [International journal]

     View Summary

    Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses.

    DOI PubMed

    Scopus

    3
    Citation
    (Scopus)
  • Free-Energy Calculation of Ribonucleic Inosines and Its Application to Nearest-Neighbor Parameters.

    Shun Sakuraba, Junichi Iwakiri, Michiaki Hamada, Tomoshi Kameda, Genichiro Tsuji, Yasuaki Kimura, Hiroshi Abe, Kiyoshi Asai

    Journal of chemical theory and computation   16 ( 9 ) 5923 - 5935  2020.09  [Refereed]  [International journal]

     View Summary

    Can current simulations quantitatively predict the stability of ribonucleic acids (RNAs)? In this research, we apply a free-energy perturbation simulation of RNAs containing inosine, a modified ribonucleic base, to the derivation of RNA nearest-neighbor parameters. A parameter set derived solely from 30 simulations was used to predict the free-energy difference of the RNA duplex with a mean unbiased error of 0.70 kcal/mol, which is a level of accuracy comparable to that obtained with parameters derived from 25 experiments. We further show that the error can be lowered to 0.60 kcal/mol by combining the simulation-derived free-energy differences with experimentally measured differences. This protocol can be used as a versatile method for deriving nearest-neighbor parameters of RNAs with various modified bases.

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines.

    Chao Zeng, Michiaki Hamada

    Genes   11 ( 7 )  2020.07  [International journal]

     View Summary

    Alternative splicing, a ubiquitous phenomenon in eukaryotes, is a regulatory mechanism for the biological diversity of individual genes. Most studies have focused on the effects of alternative splicing for protein synthesis. However, the transcriptome-wide influence of alternative splicing on RNA subcellular localization has rarely been studied. By analyzing RNA-seq data obtained from subcellular fractions across 13 human cell lines, we identified 8720 switching genes between the cytoplasm and the nucleus. Consistent with previous reports, intron retention was observed to be enriched in the nuclear transcript variants. Interestingly, we found that short and structurally stable introns were positively correlated with nuclear localization. Motif analysis reveals that fourteen RNA-binding protein (RBPs) are prone to be preferentially bound with such introns. To our knowledge, this is the first transcriptome-wide study to analyze and evaluate the effect of alternative splicing on RNA subcellular localization. Our findings reveal that alternative splicing plays a promising role in regulating RNA subcellular localization.

    DOI PubMed

    Scopus

    9
    Citation
    (Scopus)
  • Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation.

    Shion Hosoda, Suguru Nishijima, Tsukasa Fukunaga, Masahira Hattori, Michiaki Hamada

    Microbiome   8 ( 1 ) 95 - 95  2020.06  [International journal]

     View Summary

    BACKGROUND: The human gut microbiome has been suggested to affect human health and thus has received considerable attention. To clarify the structure of the human gut microbiome, clustering methods are frequently applied to human gut taxonomic profiles. Enterotypes, i.e., clusters of individuals with similar microbiome composition, are well-studied and characterized. However, only a few detailed studies on assemblages, i.e., clusters of co-occurring bacterial taxa, have been conducted. Particularly, the relationship between the enterotype and assemblage is not well-understood. RESULTS: In this study, we detected gut microbiome assemblages using a latent Dirichlet allocation (LDA) method. We applied LDA to a large-scale human gut metagenome dataset and found that a 4-assemblage LDA model could represent relationships between enterotypes and assemblages with high interpretability. This model indicated that each individual tends to have several assemblages, three of which corresponded to the three classically recognized enterotypes. Conversely, the fourth assemblage corresponded to no enterotypes and emerged in all enterotypes. Interestingly, the dominant genera of this assemblage (Clostridium, Eubacterium, Faecalibacterium, Roseburia, Coprococcus, and Butyrivibrio) included butyrate-producing species such as Faecalibacterium prausnitzii. Indeed, the fourth assemblage significantly positively correlated with three butyrate-producing functions. CONCLUSIONS: We conducted an assemblage analysis on a large-scale human gut metagenome dataset using LDA. The present study revealed that there is an enterotype-independent assemblage. Video Abstract.

    DOI PubMed

    Scopus

    24
    Citation
    (Scopus)
  • MoAIMS: efficient software for detection of enriched regions of MeRIP-Seq.

    Yiqian Zhang, Michiaki Hamada

    BMC bioinformatics   21 ( 1 ) 103 - 103  2020.03  [International journal]

     View Summary

    BACKGROUND: Methylated RNA immunoprecipitation sequencing (MeRIP-Seq) is a popular sequencing method for studying RNA modifications and, in particular, for N6-methyladenosine (m6A), the most abundant RNA methylation modification found in various species. The detection of enriched regions is a main challenge of MeRIP-Seq analysis, however current tools either require a long time or do not fully utilize features of RNA sequencing such as strand information which could cause ambiguous calling. On the other hand, with more attention on the treatment experiments of MeRIP-Seq, biologists need intuitive evaluation on the treatment effect from comparison. Therefore, efficient and user-friendly software that can solve these tasks must be developed. RESULTS: We developed a software named "model-based analysis and inference of MeRIP-Seq (MoAIMS)" to detect enriched regions of MeRIP-Seq and infer signal proportion based on a mixture negative-binomial model. MoAIMS is designed for transcriptome immunoprecipitation sequencing experiments; therefore, it is compatible with different RNA sequencing protocols. MoAIMS offers excellent processing speed and competitive performance when compared with other tools. When MoAIMS is applied to studies of m6A, the detected enriched regions contain known biological features of m6A. Furthermore, signal proportion inferred from MoAIMS for m6A treatment datasets (perturbation of m6A methyltransferases) showed a decreasing trend that is consistent with experimental observations, suggesting that the signal proportion can be used as an intuitive indicator of treatment effect. CONCLUSIONS: MoAIMS is efficient and easy-to-use software implemented in R. MoAIMS can not only detect enriched regions of MeRIP-Seq efficiently but also provide intuitive evaluation on treatment effect for MeRIP-Seq treatment datasets.

    DOI PubMed

    Scopus

    12
    Citation
    (Scopus)
  • Nucleosome destabilization by nuclear non-coding RNAs.

    Risa Fujita, Tatsuro Yamamoto, Yasuhiro Arimura, Saori Fujiwara, Hiroaki Tachiwana, Yuichi Ichikawa, Yuka Sakata, Liying Yang, Reo Maruyama, Michiaki Hamada, Mitsuyoshi Nakao, Noriko Saitoh, Hitoshi Kurumizaka

    Communications biology   3 ( 1 ) 60 - 60  2020.02  [Refereed]  [International journal]

     View Summary

    In the nucleus, genomic DNA is wrapped around histone octamers to form nucleosomes. In principle, nucleosomes are substantial barriers to transcriptional activities. Nuclear non-coding RNAs (ncRNAs) are proposed to function in chromatin conformation modulation and transcriptional regulation. However, it remains unclear how ncRNAs affect the nucleosome structure. Eleanors are clusters of ncRNAs that accumulate around the estrogen receptor-α (ESR1) gene locus in long-term estrogen deprivation (LTED) breast cancer cells, and markedly enhance the transcription of the ESR1 gene. Here we detected nucleosome depletion around the transcription site of Eleanor2, the most highly expressed Eleanor in the LTED cells. We found that the purified Eleanor2 RNA fragment drastically destabilized the nucleosome in vitro. This activity was also exerted by other ncRNAs, but not by poly(U) RNA or DNA. The RNA-mediated nucleosome destabilization may be a common feature among natural nuclear RNAs, and may function in transcription regulation in chromatin.

    DOI PubMed

    Scopus

    8
    Citation
    (Scopus)
  • Targeting the TR4 nuclear receptor-mediated lncTASR/AXL signaling with tretinoin increases the sunitinib sensitivity to better suppress the RCC progression.

    Hangchuan Shi, Yin Sun, Miao He, Xiong Yang, Michiaki Hamada, Tsukasa Fukunaga, Xiaoping Zhang, Chawnshang Chang

    Oncogene   39 ( 3 ) 530 - 545  2020.01  [Refereed]  [International journal]

     View Summary

    Renal cell carcinoma (RCC) is one of the most lethal urological tumors. Using sunitinib to improve the survival has become the first-line therapy for metastatic RCC patients. However, the occurrence of sunitinib resistance in the clinical application has curtailed its efficacy. Here we found TR4 nuclear receptor might alter the sunitinib resistance to RCC via altering the TR4/lncTASR/AXL signaling. Mechanism dissection revealed that TR4 could modulate lncTASR (ENST00000600671.1) expression via transcriptional regulation, which might then increase AXL protein expression via enhancing the stability of AXL mRNA to increase the sunitinib resistance in RCC. Human clinical surveys also linked the expression of TR4, lncTASR, and AXL to the RCC survival, and results from multiple RCC cell lines revealed that targeting this newly identified TR4-mediated signaling with small molecules, including tretinoin, metformin, or TR4-shRNAs, all led to increase the sunitinib sensitivity to better suppress the RCC progression, and our preclinical study using the in vivo mouse model further proved tretinoin had a better synergistic effect to increase sunitinib sensitivity to suppress RCC progression. Future successful clinical trials may help in the development of a novel therapy to better suppress the RCC progression.

    DOI PubMed

    Scopus

    24
    Citation
    (Scopus)
  • Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference.

    Taro Matsutani, Yuki Ueno, Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics (Oxford, England)   35 ( 22 ) 4543 - 4552  2019.11  [Refereed]  [International journal]

     View Summary

    MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures-latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed

    Scopus

    11
    Citation
    (Scopus)
  • Stromal fibroblasts induce metastatic tumor cell clusters via epithelial-mesenchymal plasticity.

    Yuko Matsumura, Yasuhiko Ito, Yoshihiro Mezawa, Kaidiliayi Sulidan, Yataro Daigo, Toru Hiraga, Kaoru Mogushi, Nadila Wali, Hiromu Suzuki, Takumi Itoh, Yohei Miyagi, Tomoyuki Yokose, Satoru Shimizu, Atsushi Takano, Yasuhisa Terao, Harumi Saeki, Masayuki Ozawa, Masaaki Abe, Satoru Takeda, Ko Okumura, Sonoko Habu, Okio Hino, Kazuyoshi Takeda, Michiaki Hamada, Akira Orimo

    Life science alliance   2 ( 4 )  2019.08  [Refereed]  [International journal]

     View Summary

    Emerging evidence supports the hypothesis that multicellular tumor clusters invade and seed metastasis. However, whether tumor-associated stroma induces epithelial-mesenchymal plasticity in tumor cell clusters, to promote invasion and metastasis, remains unknown. We demonstrate herein that carcinoma-associated fibroblasts (CAFs) frequently present in tumor stroma drive the formation of tumor cell clusters composed of two distinct cancer cell populations, one in a highly epithelial (E-cadherinhiZEB1lo/neg: Ehi) state and another in a hybrid epithelial/mesenchymal (E-cadherinloZEB1hi: E/M) state. The Ehi cells highly express oncogenic cell-cell adhesion molecules, such as carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) and CEACAM6 that associate with E-cadherin, resulting in increased tumor cell cluster formation and metastatic seeding. The E/M cells also retain associations with Ehi cells, which follow the E/M cells leading to collective invasion. CAF-produced stromal cell-derived factor 1 and transforming growth factor-β confer the Ehi and E/M states as well as invasive and metastatic traits via Src activation in apposed human breast tumor cells. Taken together, these findings indicate that invasive and metastatic tumor cell clusters are induced by CAFs via epithelial-mesenchymal plasticity.

    DOI PubMed

    Scopus

    48
    Citation
    (Scopus)
  • Identification of RNA biomarkers for chemical safety screening in neural cells derived from mouse embryonic stem cells using RNA deep sequencing analysis.

    Hidenori Tani, Taro Matsutani, Hiroshi Aoki, Kaoru Nakamura, Yu Hamaguchi, Tetsuya Nakazato, Michiaki Hamada

    Biochemical and biophysical research communications   512 ( 4 ) 641 - 646  2019.05  [Refereed]  [International journal]

     View Summary

    Chemical safety screening requires the development of more efficient assays that do not involve testing in animals. In vitro cell-based assays are among the most appropriate alternatives to animal testing for screening of chemical toxicity. Most studies performed to date made use of mRNAs as biomarkers. Recent studies have however indicated the presence of many unannotated non-coding RNAs (ncRNAs) in the transcriptome that do appear to encode proteins. In the present study, we performed whole-transcriptome sequencing analysis (RNA-Seq) to identify novel RNA biomarkers, including ncRNAs, which showed marked responses to the toxicity of nine chemicals. Chemical safety screening was performed in cell-based assays using mouse embryonic stem cell (mESC)-derived neural cells. Marked responses in the expression of some ncRNAs to the chemical compounds were observed. The results of the present study suggested that ncRNAs may be useful in chemical safety screening as novel RNA biomarkers.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • LncRRIsearch: A Web Server for lncRNA-RNA Interaction Prediction Integrated With Tissue-Specific Expression and Subcellular Localization Data.

    Tsukasa Fukunaga, Junichi Iwakiri, Yukiteru Ono, Michiaki Hamada

    Frontiers in genetics   10   462 - 462  2019  [Refereed]  [International journal]

     View Summary

    Long non-coding RNAs (lncRNAs) play critical roles in various biological processes, but the function of the majority of lncRNAs is still unclear. One approach for estimating a function of a lncRNA is the identification of its interaction target because functions of lncRNAs are expressed through interaction with other biomolecules in quite a few cases. In this paper, we developed "LncRRIsearch," which is a web server for comprehensive prediction of human and mouse lncRNA-lncRNA and lncRNA-mRNA interaction. The prediction was conducted using RIblast, which is a fast and accurate RNA-RNA interaction prediction tool. Users can investigate interaction target RNAs of a particular lncRNA through a web interface. In addition, we integrated tissue-specific expression and subcellular localization data for the lncRNAs with the web server. These data enable users to examine tissue-specific or subcellular localized lncRNA interactions. LncRRIsearch is publicly accessible at http://rtools.cbrc.jp/LncRRIsearch/.

    DOI PubMed

    Scopus

    90
    Citation
    (Scopus)
  • DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning.

    Yiqian Zhang, Michiaki Hamada

    BMC bioinformatics   19 ( Suppl 19 ) 524 - 524  2018.12  [Refereed]  [International journal]

     View Summary

    BACKGROUND: N6-methyladensine (m6A) is a common and abundant RNA methylation modification found in various species. As a type of post-transcriptional methylation, m6A plays an important role in diverse RNA activities such as alternative splicing, an interplay with microRNAs and translation efficiency. Although existing tools can predict m6A at single-base resolution, it is still challenging to extract the biological information surrounding m6A sites. RESULTS: We implemented a deep learning framework, named DeepM6ASeq, to predict m6A-containing sequences and characterize surrounding biological features based on miCLIP-Seq data, which detects m6A sites at single-base resolution. DeepM6ASeq showed better performance as compared to other machine learning classifiers. Moreover, an independent test on m6A-Seq data, which identifies m6A-containing genomic regions, revealed that our model is competitive in predicting m6A-containing sequences. The learned motifs from DeepM6ASeq correspond to known m6A readers. Notably, DeepM6ASeq also identifies a newly recognized m6A reader: FMR1. Besides, we found that a saliency map in the deep learning model could be utilized to visualize locations of m6A sites. CONCULSION: We developed a deep-learning-based framework to predict and characterize m6A-containing sequences and hope to help investigators to gain more insights for m6A research. The source code is available at https://github.com/rreybeyb/DeepM6ASeq .

    DOI PubMed

    Scopus

    100
    Citation
    (Scopus)
  • Identifying sequence features that drive ribosomal association for lncRNA.

    Chao Zeng, Michiaki Hamada

    BMC genomics   19 ( Suppl 10 ) 906 - 906  2018.12  [Refereed]  [International journal]

     View Summary

    BACKGROUND: With the increasing number of annotated long noncoding RNAs (lncRNAs) from the genome, researchers are continually updating their understanding of lncRNAs. Recently, thousands of lncRNAs have been reported to be associated with ribosomes in mammals. However, their biological functions or mechanisms are still unclear. RESULTS: In this study, we tried to investigate the sequence features involved in the ribosomal association of lncRNA. We have extracted ninety-nine sequence features corresponding to different biological mechanisms (i.e., RNA splicing, putative ORF, k-mer frequency, RNA modification, RNA secondary structure, and repeat element). An [Formula: see text]-regularized logistic regression model was applied to screen these features. Finally, we obtained fifteen and nine important features for the ribosomal association of human and mouse lncRNAs, respectively. CONCLUSION: To our knowledge, this is the first study to characterize ribosome-associated lncRNAs and ribosome-free lncRNAs from the perspective of sequence features. These sequence features that were identified in this study may shed light on the biological mechanism of the ribosomal association and provide important clues for functional analysis of lncRNAs.

    DOI PubMed

    Scopus

    17
    Citation
    (Scopus)
  • Nearest-neighbor parameter for inosine-cytosine pairs through a combined experimental and computational approach

    Shun Sakuraba, Junichi Iwakiri, Michiaki Hamada, Tomoshi Kameda, Genichiro Tsuji, Yasuaki Kimura, Hiroshi Abe, Kiyoshi Asai

       2018.10  [Refereed]

     View Summary

    <title>Abstract</title>In RNA secondary structure prediction, nearest-neighbor parameters are used to determine the stability of a given structure. We derived the nearest-neighbor parameters for RNAs containing inosine-cytosine pairs. For parameter derivation, we developed a method that combines UV adsorption measurement experiments with free-energy calculations using molecular dynamics simulations. The method provides fast drop-in parameters for modified bases. Derived parameters were compared and found to be consistent with existing parameters for canonical RNAs. A duplex with an internal inosine-cytosine pair is 0.9 kcal/mol more unstable than the same duplex with an internal guanine-cytosine pair, and is as stable as the one with an internal adenine-uracil pair (only 0.1 kcal/mol more stable) on average.

    DOI

  • A Novel Method for Assessing the Statistical Significance of RNA-RNA Interactions Between Two Long RNAs.

    Tsukasa Fukunaga, Michiaki Hamada

    Journal of computational biology : a journal of computational molecular cell biology   25 ( 9 ) 976 - 986  2018.09  [Refereed]  [International journal]

     View Summary

    RNA-RNA interactions are key mechanisms through which noncoding RNA (ncRNA) regions exert biological functions. Computational prediction of RNA-RNA interactions is an essential method for detecting novel RNA-RNA interactions because their comprehensive detection by biological experimentation is still quite difficult. Many RNA-RNA interaction prediction tools have been developed, but they tend to produce many false positives. Accordingly, assessment of the statistical significance of computationally predicted interactions is an important task. However, there is no method to evaluate the statistical significance of RNA-RNA interactions that is applicable to interactions between two long RNA sequences. We developed a method to calculate the p-value for the minimal interaction energy between two long RNA sequences. The developed method depends on the fact that minimum interaction energies of RNA-RNA interactions between long RNAs follow a Gumbel distribution when repeat sequences in RNAs are masked. To show the usefulness of the developed method, we applied it to whole human 5'-untranslated region (UTR) and 3'-UTR sequences to detect novel 5'-UTR-3'-UTR interactions. We thus identified two significant 5'-UTR-3'-UTR interactions. Specifically, the human small proline-rich repeat protein 3 shows conserved 5'-UTR-3'-UTR interactions with some nucleotide variations preserving base pairings among primates. Our developed method enables us to detect statistically significant RNA-RNA interactions between long RNAs such as long ncRNAs. Statistical significance estimates help in identification of interactions for experimental validation and provide novel insights into the function of ncRNA regions.

    DOI PubMed

    Scopus

    2
    Citation
    (Scopus)
  • Computational approaches for alternative and transient secondary structures of ribonucleic acids.

    Tsukasa Fukunaga, Michiaki Hamada

    Briefings in functional genomics   18 ( 3 ) 182 - 191  2018.06  [Refereed]  [International journal]

     View Summary

    Transient and alternative structures of ribonucleic acids (RNAs) play essential roles in various regulatory processes, such as translation regulation in living cells. Because experimental analyses for RNA structures are difficult and time-consuming, computational approaches based on RNA secondary structures are promising. In this article, we review computational methods for detecting and analyzing transient/alternative secondary structures of RNAs, including static approaches based on probabilistic distributions of RNA secondary structures and dynamic approaches such as kinetic folding and folding pathway predictions.

    DOI PubMed

    Scopus

    1
    Citation
    (Scopus)
  • Identification and analysis of ribosome-associated lncRNAs using ribosome profiling data.

    Chao Zeng, Tsukasa Fukunaga, Michiaki Hamada

    BMC genomics   19 ( 1 ) 414 - 414  2018.05  [Refereed]  [International journal]

     View Summary

    BACKGROUND: Although the number of discovered long non-coding RNAs (lncRNAs) has increased dramatically, their biological roles have not been established. Many recent studies have used ribosome profiling data to assess the protein-coding capacity of lncRNAs. However, very little work has been done to identify ribosome-associated lncRNAs, here defined as lncRNAs interacting with ribosomes related to protein synthesis as well as other unclear biological functions. RESULTS: On average, 39.17% of expressed lncRNAs were observed to interact with ribosomes in human and 48.16% in mouse. We developed the ribosomal association index (RAI), which quantifies the evidence for ribosomal associability of lncRNAs over various tissues and cell types, to catalog 691 and 409 lncRNAs that are robustly associated with ribosomes in human and mouse, respectively. Moreover, we identified 78 and 42 lncRNAs with a high probability of coding peptides in human and mouse, respectively. Compared with ribosome-free lncRNAs, ribosome-associated lncRNAs were observed to be more likely to be located in the cytoplasm and more sensitive to nonsense-mediated decay. CONCLUSION: Our results suggest that RAI can be used as an integrative and evidence-based tool for distinguishing between ribosome-associated and free lncRNAs, providing a valuable resource for the study of lncRNA functions.

    DOI PubMed

    Scopus

    50
    Citation
    (Scopus)
  • Estimating energy parameters for RNA secondary structure predictions using both experimental and computational data.

    Nishida S, Sakuraba S, Asai K, Hamada M

    IEEE/ACM transactions on computational biology and bioinformatics   16 ( 5 ) 1645 - 1655  2018.03  [Refereed]

    DOI PubMed

  • Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm.

    Taikai Takeda, Michiaki Hamada, John Hancock

    Bioinformatics (Oxford, England)   34 ( 4 ) 576 - 584  2018.02  [Refereed]  [International journal]

     View Summary

    Motivation: Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results: We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation: The software is available at https://github.com/bigsea-t/fab-phmm. Contact: mhamada@waseda.jp. Supplementary information: Supplementary data are available at Bioinformatics online.

    DOI PubMed

    Scopus

  • In silico approaches to RNA aptamer design.

    Michiaki Hamada

    Biochimie   145   8 - 14  2018.02  [Refereed]  [International journal]

     View Summary

    RNA aptamers are ribonucleic acids that bind to specific target molecules. An RNA aptamer for a disease-related protein has great potential for development into a new drug. However, huge time and cost investments are required to develop an RNA aptamer into a pharmaceutical. Recently, SELEX combined with high-throughput sequencers (i.e., HT-SELEX) has been widely used to select candidate RNA aptamers that bind to a target protein with high affinity and specificity. After candidate selection, further optimizations such as shortening and modifying candidate sequences are performed. In these steps, in silico approaches are expected to reduce the time and cost associated with aptamer drug development. In this article, we review existing in silico approaches to RNA aptamer development, including a method for ranking the candidates of RNA aptamers from HT-SELEX data, clustering a huge number of aptamer sequences, and finding motifs amidst a set of significant RNA aptamers. It is expected that further studies in addition to these methods will be utilized for in silico RNA aptamer design, permitting a minimal number of experiments to be performed through the utilization of sophisticated computational methods.

    DOI PubMed

    Scopus

    40
    Citation
    (Scopus)
  • Identification of Transposable Elements Contributing to Tissue-Specific Expression of Long Non-Coding RNAs.

    Takafumi Chishima, Junichi Iwakiri, Michiaki Hamada

    Genes   9 ( 1 )  2018.01  [Refereed]  [International journal]

     View Summary

    It has been recently suggested that transposable elements (TEs) are re-used as functional elements of long non-coding RNAs (lncRNAs). This is supported by some examples such as the human endogenous retrovirus subfamily H (HERVH) elements contained within lncRNAs and expressed specifically in human embryonic stem cells (hESCs), as required to maintain hESC identity. There are at least two unanswered questions about all lncRNAs. How many TEs are re-used within lncRNAs? Are there any other TEs that affect tissue specificity of lncRNA expression? To answer these questions, we comprehensively identify TEs that are significantly related to tissue-specific expression levels of lncRNAs. We downloaded lncRNA expression data corresponding to normal human tissue from the Expression Atlas and transformed the data into tissue specificity estimates. Then, Fisher's exact tests were performed to verify whether the presence or absence of TE-derived sequences influences the tissue specificity of lncRNA expression. Many TE-tissue pairs associated with tissue-specific expression of lncRNAs were detected, indicating that multiple TE families can be re-used as functional domains or regulatory sequences of lncRNAs. In particular, we found that the antisense promoter region of L1PA2, a LINE-1 subfamily, appears to act as a promoter for lncRNAs with placenta-specific expression.

    DOI PubMed

    Scopus

    42
    Citation
    (Scopus)
  • The hominoid-specific gene DSCR4 is involved in regulation of human leukocyte migration

    Saber MM, Karimiavargani M, Hettiarachchi N, Hamada M, Uzawa T, Ito Y, Saitou N

       2017.09

    DOI

  • RIblast: an ultrafast RNA-RNA interaction prediction system based on a seed-and-extension approach.

    Tsukasa Fukunaga, Michiaki Hamada

    Bioinformatics (Oxford, England)   33 ( 17 ) 2666 - 2674  2017.09  [Refereed]  [International journal]

     View Summary

    Motivation: LncRNAs play important roles in various biological processes. Although more than 58 000 human lncRNA genes have been discovered, most known lncRNAs are still poorly characterized. One approach to understanding the functions of lncRNAs is the detection of the interacting RNA target of each lncRNA. Because experimental detections of comprehensive lncRNA-RNA interactions are difficult, computational prediction of lncRNA-RNA interactions is an indispensable technique. However, the high computational costs of existing RNA-RNA interaction prediction tools prevent their application to large-scale lncRNA datasets. Results: Here, we present 'RIblast', an ultrafast RNA-RNA interaction prediction method based on the seed-and-extension approach. RIblast discovers seed regions using suffix arrays and subsequently extends seed regions based on an RNA secondary structure energy model. Computational experiments indicate that RIblast achieves a level of prediction accuracy similar to those of existing programs, but at speeds over 64 times faster than existing programs. Availability and implementation: The source code of RIblast is freely available at https://github.com/fukunagatsu/RIblast . Contact: t.fukunaga@kurenai.waseda.jp or mhamada@waseda.jp. Supplementary information: Supplementary data are available at Bioinformatics online.

    DOI PubMed

    Scopus

    79
    Citation
    (Scopus)
  • Computational prediction of lncRNA-mRNA interactionsby integrating tissue specificity in human transcriptome.

    Junichi Iwakiri, Goro Terai, Michiaki Hamada

    Biology direct   12 ( 1 ) 15 - 15  2017.06  [Refereed]  [International journal]

     View Summary

    Long noncoding RNAs (lncRNAs) play a key role in normal tissue differentiation and cancer development through their tissue-specific expression in the human transcriptome. Recent investigations of macromolecular interactions have shown that tissue-specific lncRNAs form base-pairing interactions with various mRNAs associated with tissue-differentiation, suggesting that tissue specificity is an important factor controlling human lncRNA-mRNA interactions.Here, we report investigations of the tissue specificities of lncRNAs and mRNAs by using RNA-seq data across various human tissues as well as computational predictions of tissue-specific lncRNA-mRNA interactions inferred by integrating the tissue specificity of lncRNAs and mRNAs into our comprehensive prediction of human lncRNA-RNA interactions. Our predicted lncRNA-mRNA interactions were evaluated by comparisons with experimentally validated lncRNA-mRNA interactions (between the TINCR lncRNA and mRNAs), showing the improvement of prediction accuracy over previous prediction methods that did not account for tissue specificities of lncRNAs and mRNAs. In addition, our predictions suggest that the potential functions of TINCR lncRNA not only for epidermal differentiation but also for esophageal development through lncRNA-mRNA interactions. REVIEWERS: This article was reviewed by Dr. Weixiong Zhang and Dr. Bojan Zagrovic.

    DOI PubMed

    Scopus

    37
    Citation
    (Scopus)
  • AMAP: A pipeline for whole-genome mutation detection in Arabidopsis thaliana.

    Kotaro Ishii, Yusuke Kazama, Tomonari Hirano, Michiaki Hamada, Yukiteru Ono, Mieko Yamada, Tomoko Abe

    Genes & genetic systems   91 ( 4 ) 229 - 233  2017.03  [Refereed]  [Domestic journal]

     View Summary

    Detection of mutations at the whole-genome level is now possible by the use of high-throughput sequencing. However, determining mutations is a time-consuming process due to the number of false positives provided by mutation-detecting programs. AMAP (automated mutation analysis pipeline) was developed to overcome this issue. AMAP integrates a set of well-validated programs for mapping (BWA), removal of potential PCR duplicates (Picard), realignment (GATK) and detection of mutations (SAMtools, GATK, Pindel, BreakDancer and CNVnator). Thus, all types of mutations such as base substitution, deletion, insertion, translocation and chromosomal rearrangement can be detected by AMAP. In addition, AMAP automatically distinguishes false positives by comparing lists of candidate mutations in sequenced mutants. We tested AMAP by inputting already analyzed read data derived from three individual Arabidopsis thaliana mutants and confirmed that all true mutations were included in the list of candidate mutations. The result showed that the number of false positives was reduced to 12% of that obtained in a previous analysis that lacked a process of reducing false positives. Thus, AMAP will accelerate not only the analysis of mutation induction by individual mutagens but also the process of forward genetics.

    DOI PubMed

    Scopus

    4
    Citation
    (Scopus)
  • Training alignment parameters for arbitrary sequencers with LAST-TRAIN.

    Michiaki Hamada, Yukiteru Ono, Kiyoshi Asai, Martin C Frith

    Bioinformatics (Oxford, England)   33 ( 6 ) 926 - 928  2017.03  [Refereed]  [International journal]

     View Summary

    Summary: LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation: the source code is freely available at http://last.cbrc.jp/. Contact: mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary information: Supplementary data are available at Bioinformatics online.

    DOI PubMed

    Scopus

    52
    Citation
    (Scopus)
  • Improved Accuracy in RNA-Protein Rigid Body Docking by Incorporating Force Field for Molecular Dynamics Simulation into the Scoring Function.

    Junichi Iwakiri, Michiaki Hamada, Kiyoshi Asai, Tomoshi Kameda

    Journal of chemical theory and computation   12 ( 9 ) 4688 - 97  2016.09  [Refereed]  [International journal]

     View Summary

    RNA-protein interactions play fundamental roles in many biological processes. To understand these interactions, it is necessary to know the three-dimensional structures of RNA-protein complexes. However, determining the tertiary structure of these complexes is often difficult, suggesting that an accurate rigid body docking for RNA-protein complexes is needed. In general, the rigid body docking process is divided into two steps: generating candidate structures from the individual RNA and protein structures and then narrowing down the candidates. In this study, we focus on the former problem to improve the prediction accuracy in RNA-protein docking. Our method is based on the integration of physicochemical information about RNA into ZDOCK, which is known as one of the most successful computer programs for protein-protein docking. Because recent studies showed the current force field for molecular dynamics simulation of protein and nucleic acids is quite accurate, we modeled the physicochemical information about RNA by force fields such as AMBER and CHARMM. A comprehensive benchmark of RNA-protein docking, using three recently developed data sets, reveals the remarkable prediction accuracy of the proposed method compared with existing programs for docking: the highest success rate is 34.7% for the predicted structure of the RNA-protein complex with the best score and 79.2% for 3,600 predicted ones. Three full atomistic force fields for RNA (AMBER94, AMBER99, and CHARMM22) produced almost the same accurate result, which showed current force fields for nucleic acids are quite accurate. In addition, we found that the electrostatic interaction and the representation of shape complementary between protein and RNA plays the important roles for accurate prediction of the native structures of RNA-protein complexes.

    DOI PubMed J-GLOBAL

    Scopus

    22
    Citation
    (Scopus)
  • Computational Approaches for Long Non-coding RNA Research

    IWAKIRI Junichi, HAMADA Michiaki

    Seibutsu Butsuri   56 ( 4 ) 217 - 220  2016.08  [Refereed]

     View Summary

    <p>Recent advances in high throughput sequencing technologies unveiled that large number of long non-coding RNAs (lncRNAs) are transcribed from human genome. Currently, these emerging transcripts are needed to be functionally classified and annotated. Here we review several bioinformatic approaches for analyzing the important characteristics of the lncRNAs toward discovering their functions: 1) tissue specificities of lncRNA expressions, 2) two types of macromolecular interactions (RNA-RNA and RNA-protein interactions), 3) secondary structures of lncRNAs.</p>

    DOI CiNii

  • Rtools: a web server for various secondary structural analyses on single RNA sequences

    Michiaki Hamada, Yukiteru Ono, Hisanori Kiryu, Kengo Sato, Yuki Kato, Tsukasa Fukunaga, Ryota Mori, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   44 ( W1 ) W302 - W307  2016.07  [Refereed]

     View Summary

    The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.

    DOI

  • Rtools: a web server for various secondary structural analyses on single RNA sequences

    Michiaki Hamada, Yukiteru Ono, Hisanori Kiryu, Kengo Sato, Yuki Kato, Tsukasa Fukunaga, Ryota Mori, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   44 ( W1 ) W302 - W307  2016.07  [Refereed]

     View Summary

    The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.

    DOI PubMed

  • Comprehensive prediction of lncRNA-RNA interactions in human transcriptome

    Goro Terai, Junichi Iwakiri, Tomoshi Kameda, Michiaki Hamada, Kiyoshi Asai

    BMC GENOMICS   17 ( S-1 ) 12  2016  [Refereed]

     View Summary

    Motivation: Recent studies have revealed that large numbers of non-coding RNAs are transcribed in humans, but only a few of them have been identified with their functions. Identification of the interaction target RNAs of the non-coding RNAs is an important step in predicting their functions. The current experimental methods to identify RNA-RNA interactions, however, are not fast enough to apply to a whole human transcriptome. Therefore, computational predictions of RNA-RNA interactions are desirable, but this is a challenging task due to the huge computational costs involved.
    Results: Here, we report comprehensive predictions of the interaction targets of lncRNAs in a whole human transcriptome for the first time. To achieve this, we developed an integrated pipeline for predicting RNA-RNA interactions on the K computer, which is one of the fastest super-computers in the world. Comparisons with experimentally-validated lncRNA-RNA interactions support the quality of the predictions. Additionally, we have developed a database that catalogs the predicted lncRNA-RNA interactions to provide fundamental information about the targets of lncRNAs.

    DOI

    Scopus

    52
    Citation
    (Scopus)
  • Bioinformatics tools for lncRNA research.

    Iwakiri J, Hamada M, Asai K

    Biochimica et biophysica acta   1859 ( 1 ) 23 - 30  2016.01  [Refereed]

    DOI PubMed

    Scopus

    49
    Citation
    (Scopus)
  • Comprehensive prediction of lncRNA-RNA interactions in human transcriptome

    Goro Terai, Junichi Iwakiri, Tomoshi Kameda, Michiaki Hamada, Kiyoshi Asai

    BMC GENOMICS   17 ( 1 ) 12  2016  [Refereed]

     View Summary

    Motivation: Recent studies have revealed that large numbers of non-coding RNAs are transcribed in humans, but only a few of them have been identified with their functions. Identification of the interaction target RNAs of the non-coding RNAs is an important step in predicting their functions. The current experimental methods to identify RNA-RNA interactions, however, are not fast enough to apply to a whole human transcriptome. Therefore, computational predictions of RNA-RNA interactions are desirable, but this is a challenging task due to the huge computational costs involved.
    Results: Here, we report comprehensive predictions of the interaction targets of lncRNAs in a whole human transcriptome for the first time. To achieve this, we developed an integrated pipeline for predicting RNA-RNA interactions on the K computer, which is one of the fastest super-computers in the world. Comparisons with experimentally-validated lncRNA-RNA interactions support the quality of the predictions. Additionally, we have developed a database that catalogs the predicted lncRNA-RNA interactions to provide fundamental information about the targets of lncRNAs.

    DOI PubMed J-GLOBAL

    Scopus

    52
    Citation
    (Scopus)
  • Bioinformatics tools for lncRNA research

    Junichi Iwakiri, Michiaki Hamada, Kiyoshi Asai

    BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS   1859 ( 1 ) 23 - 30  2016.01  [Refereed]

     View Summary

    Current experimental methods to identify the functions of a large number of the candidates of long non-coding RNAs (lncRNAs) are limited in their throughput. Therefore, it is essential to know which tools are effective for understanding lncRNAs so that reasonable speed and accuracy can be achieved. In this paper, we review the currently available bioinformatics tools and databases that are useful for finding non-coding RNAs and analyzing their structures, conservation, interactions, co-expressions and localization. This article is part of a Special Issue entitled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa. (C) 2015 Elsevier B.V. All fights reserved.

    DOI PubMed J-GLOBAL

    Scopus

    49
    Citation
    (Scopus)
  • Privacy-preserving search for chemical compound databases

    Kana Shimizu, Koji Nuida, Hiromi Arai, Shigeo Mitsunari, Nuttapong Attrapadung, Michiaki Hamada, Koji Tsuda, Takatsugu Hirokawa, Jun Sakuma, Goichiro Hanaoka, Kiyoshi Asai

    BMC BIOINFORMATICS   16 ( 18 ) S6  2015.12  [Refereed]

     View Summary

    Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources.
    Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation.
    Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

    DOI PubMed J-GLOBAL

    Scopus

    11
    Citation
    (Scopus)
  • A semi-supervised learning approach for RNA secondary structure prediction

    Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    COMPUTATIONAL BIOLOGY AND CHEMISTRY   57   72 - 79  2015.08  [Refereed]

     View Summary

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. (C) 2015 Elsevier Ltd. All rights reserved.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • Learning chromatin states with factorized information criteria

    Michiaki Hamada, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai

    BIOINFORMATICS   31 ( 15 ) 2426 - 2433  2015.08  [Refereed]

     View Summary

    Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized.
    Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • A semi-supervised learning approach for RNA secondary structure prediction

    Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    COMPUTATIONAL BIOLOGY AND CHEMISTRY   57   72 - 79  2015.08  [Refereed]

     View Summary

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. (C) 2015 Elsevier Ltd. All rights reserved.

    DOI PubMed J-GLOBAL

    Scopus

    13
    Citation
    (Scopus)
  • Learning chromatin states with factorized information criteria

    Michiaki Hamada, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai

    BIOINFORMATICS   31 ( 15 ) 2426 - 2433  2015.08  [Refereed]

     View Summary

    Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized.
    Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

    DOI PubMed J-GLOBAL

    Scopus

    9
    Citation
    (Scopus)
  • RNA secondary structure prediction from multi-aligned sequences

    Michiaki Hamada

    RNA Bioinformatics   1269   17 - 38  2015.01  [Refereed]

     View Summary

    It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics
    the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.

    DOI PubMed J-GLOBAL

    Scopus

    2
    Citation
    (Scopus)
  • Efficient calculation of exact probability distributions of integer features on RNA secondary structures

    Ryota Mori, Michiaki Hamada, Kiyoshi Asai

    BMC GENOMICS   15   S6  2014.12  [Refereed]

     View Summary

    Background: Although the needs for analyses of secondary structures of RNAs are increasing, prediction of the secondary structures of RNAs are not always reliable. Because an RNA may have a complicated energy landscape, comprehensive representations of the whole ensemble of the secondary structures, such as the probability distributions of various features of RNA secondary structures are required.
    Results: A general method to efficiently compute the distribution of any integer scalar/vector function on the secondary structure is proposed. We also show two concrete algorithms, for Hamming distance from a reference structure and for 5' - 3' distance, which can be constructed by following our general method. These practical applications of this method show the effectiveness of the proposed method.
    Conclusions: The proposed method provides a clear and comprehensive procedure to construct algorithms for distributions of various integer features. In addition, distributions of integer vectors, that is a combination of different integer scores, can be also described by applying our 2D expanding technique.

    DOI PubMed J-GLOBAL

    Scopus

    10
    Citation
    (Scopus)
  • Reference-free prediction of rearrangement breakpoint reads

    Edward Wijaya, Kana Shimizu, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   30 ( 18 ) 2559 - 2567  2014.09  [Refereed]

     View Summary

    Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.
    Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100x, it finds similar to 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

    DOI PubMed J-GLOBAL

    Scopus

    5
    Citation
    (Scopus)
  • Reference-free prediction of rearrangement breakpoint reads

    Edward Wijaya, Kana Shimizu, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   30 ( 18 ) 2559 - 2567  2014.09  [Refereed]

     View Summary

    Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.
    Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100x, it finds similar to 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Fighting against uncertainty: an essential issue in bioinformatics

    Michiaki Hamada

    BRIEFINGS IN BIOINFORMATICS   15 ( 5 ) 748 - 767  2014.09  [Refereed]

     View Summary

    Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.

    DOI

    Scopus

    11
    Citation
    (Scopus)
  • RNA structural alignments, part II: non-Sankoff approaches for structural alignments.

    Asai K, Hamada M

    Methods in molecular biology (Clifton, N.J.)   1097   291 - 301  2014  [Refereed]

    DOI PubMed J-GLOBAL

  • Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions

    Junichi Iwakiri, Tomoshi Kameda, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 20 ) 2524 - 2528  2013.10  [Refereed]

     View Summary

    Motivation: Understanding the details of protein-RNA interactions is important to reveal the functions of both the RNAs and the proteins. In these interactions, the secondary structures of the RNAs play an important role. Because RNA secondary structures in protein-RNA complexes are variable, considering the ensemble of RNA secondary structures is a useful approach. In particular, recent studies have supported the idea that, in the analysis of RNA secondary structures, the base-pairing probabilities (BPPs) of RNAs (i.e. the probabilities of forming a base pair in the ensemble of RNA secondary structures) provide richer and more robust information about the structures than a single RNA secondary structure, for example, the minimum free energy structure or a snapshot of structures in the Protein Data Bank. However, there has been no investigation of the BPPs in protein-RNA interactions.
    Results: In this study, we analyzed BPPs of RNA molecules involved in known protein-RNA complexes in the Protein Data Bank. Our analysis suggests that, in the tertiary structures, the BPPs (which are computed using only sequence information) for unpaired nucleotides with intermolecular hydrogen bonds (hbonds) to amino acids were significantly lower than those for unpaired nucleotides without hbonds. On the other hand, no difference was found between the BPPs for paired nucleotides with and without intermolecular hbonds. Those findings were commonly supported by three probabilistic models, which provide the ensemble of RNA secondary structures, including the McCaskill model based on Turner's free energy of secondary structures.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions

    Junichi Iwakiri, Tomoshi Kameda, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 20 ) 2524 - 2528  2013.10  [Refereed]

     View Summary

    Motivation: Understanding the details of protein-RNA interactions is important to reveal the functions of both the RNAs and the proteins. In these interactions, the secondary structures of the RNAs play an important role. Because RNA secondary structures in protein-RNA complexes are variable, considering the ensemble of RNA secondary structures is a useful approach. In particular, recent studies have supported the idea that, in the analysis of RNA secondary structures, the base-pairing probabilities (BPPs) of RNAs (i.e. the probabilities of forming a base pair in the ensemble of RNA secondary structures) provide richer and more robust information about the structures than a single RNA secondary structure, for example, the minimum free energy structure or a snapshot of structures in the Protein Data Bank. However, there has been no investigation of the BPPs in protein-RNA interactions.
    Results: In this study, we analyzed BPPs of RNA molecules involved in known protein-RNA complexes in the Protein Data Bank. Our analysis suggests that, in the tertiary structures, the BPPs (which are computed using only sequence information) for unpaired nucleotides with intermolecular hydrogen bonds (hbonds) to amino acids were significantly lower than those for unpaired nucleotides without hbonds. On the other hand, no difference was found between the BPPs for paired nucleotides with and without intermolecular hbonds. Those findings were commonly supported by three probabilistic models, which provide the ensemble of RNA secondary structures, including the McCaskill model based on Turner's free energy of secondary structures.

    DOI PubMed J-GLOBAL

    Scopus

    10
    Citation
    (Scopus)
  • Fighting against uncertainty: An essential issue in bioinformatics

    Michiaki Hamada

    Briefings in Bioinformatics   15 ( 5 ) 748 - 767  2013.05  [Refereed]

     View Summary

    Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.

    DOI PubMed J-GLOBAL

    Scopus

    11
    Citation
    (Scopus)
  • CentroidAlign-Web: A fast and accurate multiple aligner for long non-coding RNAs

    Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    International Journal of Molecular Sciences   14 ( 3 ) 6144 - 6156  2013.03  [Refereed]

     View Summary

    Due to the recent discovery of non-coding RNAs (ncRNAs), multiple sequence alignment (MSA) of those long RNA sequences is becoming increasingly important for classifying and determining the functional motifs in RNAs. However, not only primary (nucleotide) sequences, but also secondary structures of ncRNAs are closely related to their function and are conserved evolutionarily. Hence, information about secondary structures should be considered in the sequence alignment of ncRNAs. Yet, in general, a huge computational time is required in order to compute MSAs, taking secondary structure information into account. In this paper, we describe a fast and accurate web server, called CentroidAlign-Web, which can handle long RNA sequences. The web server also appropriately incorporates information about known secondary structures into MSAs. Computational experiments indicate that our web server is fast and accurate enough to handle long RNA sequences. CentroidAlign-Web is freely available from http://centroidalign.ncrna.org/. © 2013 by the authors
    licensee MDPI, Basel, Switzerland.

    DOI PubMed J-GLOBAL

    Scopus

    4
    Citation
    (Scopus)
  • Generalized Centroid Estimators in Bioinformatics

    Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

    CoRR   abs/1305.4339  2013  [Refereed]

  • PBSIM: PacBio reads simulator-toward accurate genome assembly

    Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 1 ) 119 - 121  2013.01  [Refereed]

     View Summary

    Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.
    Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

    DOI

    Scopus

    213
    Citation
    (Scopus)
  • PBSIM: PacBio reads simulator-toward accurate genome assembly

    Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

    BIOINFORMATICS   29 ( 1 ) 119 - 121  2013.01  [Refereed]

     View Summary

    Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.
    Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

    DOI PubMed J-GLOBAL

    Scopus

    213
    Citation
    (Scopus)
  • Direct Updating of an RNA Base-Pairing Probability Matrix with Marginal Probability Constraints

    Michiaki Hamada

    JOURNAL OF COMPUTATIONAL BIOLOGY   19 ( 12 ) 1265 - 1276  2012.12  [Refereed]

     View Summary

    A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • Direct updating of an RNA base-pairing probability matrix with marginal probability constraints

    Michiaki Hamada

    Journal of Computational Biology   19 ( 12 ) 1265 - 1276  2012.12  [Refereed]

     View Summary

    A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented. © 2012 Mary Ann Liebert, Inc.

    DOI PubMed J-GLOBAL

    Scopus

    8
    Citation
    (Scopus)
  • Shape-based alignment of genomic landscapes in multi-scale resolution

    Hiroki Ashida, Kiyoshi Asai, Michiaki Hamada

    NUCLEIC ACIDS RESEARCH   40 ( 14 ) 6435 - 6448  2012.08  [Refereed]

     View Summary

    Due to dramatic advances in DNA technology, quantitative measures of annotation data can now be obtained in continuous coordinates across the entire genome, allowing various heterogeneous 'genomic landscapes' to emerge. Although much effort has been devoted to comparing DNA sequences, not much attention has been given to comparing these large quantities of data comprehensively. In this article, we introduce a method for rapidly detecting local regions that show high correlations between genomic landscapes. We overcame the size problem for genome-wide data by converting the data into series of symbols and then carrying out sequence alignment. We also decomposed the oscillation of the landscape data into different frequency bands before analysis, since the real genomic landscape is a mixture of embedded and confounded biological processes working at different scales in the cell nucleus. To verify the usefulness and generality of our method, we applied our approach to well investigated landscapes from the human genome, including several histone modifications. Furthermore, by applying our method to over 20 genomic landscapes in human and 12 in mouse, we found that DNA replication timing and the density of Alu insertions are highly correlated genome-wide in both species, even though the Alu elements have amplified independently in the two genomes. To our knowledge, this is the first method to align genomic landscapes at multiple scales according to their shape.

    DOI PubMed J-GLOBAL

    Scopus

    5
    Citation
    (Scopus)
  • A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

    Michiaki Hamada, Kiyoshi Asai

    JOURNAL OF COMPUTATIONAL BIOLOGY   19 ( 5 ) 532 - 549  2012.05  [Refereed]

     View Summary

    Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

    Michiaki Hamada, Kiyoshi Asai

    JOURNAL OF COMPUTATIONAL BIOLOGY   19 ( 5 ) 532 - 549  2012.05  [Refereed]

     View Summary

    Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

    DOI PubMed J-GLOBAL

    Scopus

    14
    Citation
    (Scopus)
  • Privacy preservation in information retrieval

    荒井 ひろみ, 清水 佳奈, 浜田 道昭

    人工知能学会全国大会論文集   26   1 - 4  2012

    CiNii

  • Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection

    Michiaki Hamada, Edward Wijaya, Martin C. Frith, Kiyoshi Asai

    BIOINFORMATICS   27 ( 22 ) 3085 - 3092  2011.11  [Refereed]

     View Summary

    Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e. g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.
    Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection

    Michiaki Hamada, Edward Wijaya, Martin C. Frith, Kiyoshi Asai

    BIOINFORMATICS   27 ( 22 ) 3085 - 3092  2011.11  [Refereed]

     View Summary

    Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e. g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.
    Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

    DOI PubMed J-GLOBAL

    Scopus

    13
    Citation
    (Scopus)
  • CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences

    Michiaki Hamada, Koichiro Yamada, Kengo Sato, Martin C. Frith, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( Web-Server-Issue ) W100 - W106  2011.07  [Refereed]

     View Summary

    Although secondary structure predictions of an individual RNA sequence have been widely used in a number of sequence analyses of RNAs, accuracy is still limited. Recently, we proposed a method (called 'CentroidHomfold'), which includes information about homologous sequences into the prediction of the secondary structure of the target sequence, and showed that it substantially improved the performance of secondary structure predictions. CentroidHomfold, however, forces users to prepare homologous sequences of the target sequence. We have developed a Web application (CentroidHomfold-LAST) that predicts the secondary structure of the target sequence using automatically collected homologous sequences. LAST, which is a fast and sensitive local aligner, and CentroidHomfold are employed in the Web application. Computational experiments with a commonly-used data set indicated that CentroidHomfold-LAST substantially outperformed conventional secondary structure predictions including CentroidFold and RNAfold.

    DOI

    Scopus

    22
    Citation
    (Scopus)
  • IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming

    Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, Kiyoshi Asai

    BIOINFORMATICS   27 ( 13 ) I85 - I93  2011.07  [Refereed]

     View Summary

    Motivation: Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy.
    Results: We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods.

    DOI

    Scopus

    210
    Citation
    (Scopus)
  • Antagonistic RNA aptamer specific to a heterodimeric form of human interleukin-17A/F

    Hironori Adachi, Akira Ishiguro, Michiaki Hamada, Eri Sakota, Kiyoshi Asai, Yoshikazu Nakamura

    BIOCHIMIE   93 ( 7 ) 1081 - 1088  2011.07  [Refereed]

     View Summary

    Interleukin-17 (IL-17) is a pro-inflammatory cytokine produced primarily by a subset of CD4(+) cells, called Th17 cells, that is involved in host defense, inflammation and autoimmune disorders. The two most structurally related IL-17 family members, IL-17A and IL-17F, form homodimeric (IL-17A/A, IL-17F/F) and heterodimeric (IL-17A/F) complexes. Although the biological significance of IL-17A and IL-17F have been investigated using respective antibodies or gene knockout mice, the functional study of IL-17A/F heterodimeric form has been hampered by the lack of an inhibitory tool specific to IL-17A/F. In this study, we aimed to develop an RNA aptamer that specifically inhibits IL-17A/F. Aptamers are short single-stranded nucleic acid sequences that are selected in vitro based on their high affinity to a target molecule. One selected aptamer against human IL-17A/F, AptAF42, was isolated by repeated cycles of selection and counterselection against heterodimeric and homodimeric complexes, respectively. Thus, AptAF42 bound IL-17A/F but not IL-17A/A or IL-17F/F. The optimized derivative, AptAF42dope1, blocked the binding of IL-17A/F, but not of IL-17A/A or IL-17F/F, to the IL-17 receptor in the surface plasmon resonance assay in vitro. Consistently, AptAF42dope1 blocked cytokine GRO-alpha production induced by IL-17A/F, but not by IL-17A/A or IL-17F/F, in human cells. An RNA footprinting assay using ribonucleases against AptAF42dope1 in the presence or absence of IL-17A/F revealed that part of the predicted secondary structure fluctuates between alternate forms and that AptAF42dope1 is globally protected from ribonuclease cleavage by IL-17A/F. These results suggest that the selected aptamer recognizes a global conformation specified by the heterodimeric surface of IL-17A/F. (C) 2011 Elsevier Masson SAS. All rights reserved.

    DOI PubMed J-GLOBAL

    Scopus

    19
    Citation
    (Scopus)
  • CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences

    Michiaki Hamada, Koichiro Yamada, Kengo Sato, Martin C. Frith, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( Web Server issue ) W100 - W106  2011.07  [Refereed]

     View Summary

    Although secondary structure predictions of an individual RNA sequence have been widely used in a number of sequence analyses of RNAs, accuracy is still limited. Recently, we proposed a method (called 'CentroidHomfold'), which includes information about homologous sequences into the prediction of the secondary structure of the target sequence, and showed that it substantially improved the performance of secondary structure predictions. CentroidHomfold, however, forces users to prepare homologous sequences of the target sequence. We have developed a Web application (CentroidHomfold-LAST) that predicts the secondary structure of the target sequence using automatically collected homologous sequences. LAST, which is a fast and sensitive local aligner, and CentroidHomfold are employed in the Web application. Computational experiments with a commonly-used data set indicated that CentroidHomfold-LAST substantially outperformed conventional secondary structure predictions including CentroidFold and RNAfold.

    DOI PubMed J-GLOBAL

    Scopus

    22
    Citation
    (Scopus)
  • IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming

    Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, Kiyoshi Asai

    BIOINFORMATICS   27 ( 13 ) I85 - I93  2011.07  [Refereed]

     View Summary

    Motivation: Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy.
    Results: We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods.

    DOI PubMed J-GLOBAL

    Scopus

    210
    Citation
    (Scopus)
  • Generalized Centroid Estimators in Bioinformatics

    Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

    PLOS ONE   6 ( 2 ) e16450  2011.02  [Refereed]

     View Summary

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.

    DOI PubMed

    Scopus

    14
    Citation
    (Scopus)
  • Improving the accuracy of predicting secondary structure for aligned RNA sequences

    Michiaki Hamada, Kengo Sato, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( 2 ) 393 - 402  2011.01  [Refereed]

     View Summary

    Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.

    DOI

    Scopus

    49
    Citation
    (Scopus)
  • Improving the accuracy of predicting secondary structure for aligned RNA sequences

    Michiaki Hamada, Kengo Sato, Kiyoshi Asai

    NUCLEIC ACIDS RESEARCH   39 ( 2 ) 393 - 402  2011.01  [Refereed]

     View Summary

    Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.

    DOI PubMed CiNii J-GLOBAL

    Scopus

    49
    Citation
    (Scopus)
  • Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

    Michiaki Hamada, Kengo Sato, Kiyoshi Asai

    BMC BIOINFORMATICS   11   586  2010.11  [Refereed]

     View Summary

    Background: Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy ( MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.
    Results: Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the pseudo-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the gamma-centroid estimator.
    Conclusions: This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-) expected accuracy with respect to various evaluation measures including MCC and F-score.

    DOI PubMed J-GLOBAL

    Scopus

    22
    Citation
    (Scopus)
  • RactIP: Fast and accurate prediction of RNA-RNA interaction using integer programming

    Yuki Kato, Kengo Sato, Michiaki Hamada, Yoshihide Watanabe, Kiyoshi Asai, Tatsuya Akutsu

    Bioinformatics   26 ( 18 ) i460 - i466  2010.09  [Refereed]

     View Summary

    Motivation: Considerable attention has been focused on predicting RNA-RNA interaction since it is a key to identifying possible targets of non-coding small RNAs that regulate gene expression post-transcriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a trade-off between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited.Results: We present RactIP, a fast and accurate prediction method for RNA-RNA interaction of general type using integer programming. RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the objective function of integer programming using posterior internal and external base-paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA-RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures. © The Author(s) 2010. Published by Oxford University Press.

    DOI PubMed J-GLOBAL

    Scopus

    3
    Citation
    (Scopus)
  • RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming

    Yuki Kato, Kengo Sato, Michiaki Hamada, Yoshihide Watanabe, Kiyoshi Asai, Tatsuya Akutsu

    BIOINFORMATICS   26 ( 18 ) i460 - i466  2010.09  [Refereed]

     View Summary

    Motivation: Considerable attention has been focused on predicting RNA-RNA interaction since it is a key to identifying possible targets of non-coding small RNAs that regulate gene expression post-transcriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a tradeoff between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited.
    Results: We present RactIP, a fast and accurate prediction method for RNA-RNA interaction of general type using integer programming. RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the objective function of integer programming using posterior internal and external base-paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA-RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A non-parametric bayesian approach for predicting rna secondary structures

    Kengo Sato, Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai, Yasubumi Sakakibara

    Journal of Bioinformatics and Computational Biology   8 ( 4 ) 727 - 742  2010.08  [Refereed]

     View Summary

    Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models. © 2010 Imperial College Press.

    DOI

    Scopus

    11
    Citation
    (Scopus)
  • Parameters for accurate genome alignment

    Martin C. Frith, Michiaki Hamada, Paul Horton

    BMC Bioinformatics   11   80  2010.02  [Refereed]

     View Summary

    Background: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.Results: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.Conclusions: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/. © 2010 Frith et al
    licensee BioMed Central Ltd.

    DOI PubMed J-GLOBAL

    Scopus

    160
    Citation
    (Scopus)
  • Parameters for accurate genome alignment

    Martin C. Frith, Michiaki Hamada, Paul Horton

    BMC BIOINFORMATICS   11   80  2010.02  [Refereed]

     View Summary

    Background: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.
    Results: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that gamma-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.
    Conclusions: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.

    DOI

    Scopus

    160
    Citation
    (Scopus)
  • CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score

    Michiaki Hamada, Kengo Sato, Hisanori Kiryu, Toutai Mituyama, Kiyoshi Asai

    BIOINFORMATICS   25 ( 24 ) 3236 - 3243  2009.12  [Refereed]

     View Summary

    Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.
    Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L-3 + c(2)dL(2)) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.

    DOI PubMed J-GLOBAL

    Scopus

    39
    Citation
    (Scopus)
  • CENTROIDFOLD: a web server for RNA secondary structure prediction

    Kengo Sato, Michiaki Hamada, Kiyoshi Asai, Toutai Mituyama

    NUCLEIC ACIDS RESEARCH   37 ( Web Server issue ) W277 - W280  2009.07  [Refereed]

     View Summary

    The CENTROIDFOLD web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the 'execute CentroidFold' button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CENTROIDFOLD software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement.

    DOI PubMed J-GLOBAL

    Scopus

    252
    Citation
    (Scopus)
  • Predictions of RNA secondary structure by combining homologous sequence information.

    Hamada M, Sato K, Kiryu H, Mituyama T, Asai K

    Bioinformatics (Oxford, England)   25 ( 12 ) 330 - 338  2009.06  [Refereed]  [International journal]

     View Summary

    MOTIVATION: Secondary structure prediction of RNA sequences is an important problem. There have been progresses in this area, but the accuracy of prediction from an RNA sequence is still limited. In many cases, however, homologous RNA sequences are available with the target RNA sequence whose secondary structure is to be predicted. RESULTS: In this article, we propose a new method for secondary structure predictions of individual RNA sequences by taking the information of their homologous sequences into account without assuming the common secondary structure of the entire sequences. The proposed method is based on posterior decoding techniques, which consider all the suboptimal secondary structures of the target and homologous sequences and all the suboptimal alignments between the target sequence and each of the homologous sequences. In our computational experiments, the proposed method provides better predictions than those performed only on the basis of the formation of individual RNA sequences and those performed by using methods for predicting the common secondary structure of the homologous sequences. Remarkably, we found that the common secondary predictions sometimes give worse predictions for the secondary structure of a target sequence than the predictions from the individual target sequence, while the proposed method always gives good predictions for the secondary structure of target sequences in all tested cases. AVAILABILITY: Supporting information and software are available online at: http://www.ncrna.org/software/centroidfold/ismb2009/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    DOI PubMed J-GLOBAL

    Scopus

    42
    Citation
    (Scopus)
  • Prediction of RNA secondary structure using generalized centroid estimators

    Michiaki Hamada, Hisanori Kiryu, Kengo Sato, Toutai Mituyama, Kiyoshi Asai

    BIOINFORMATICS   25 ( 4 ) 465 - 473  2009.02  [Refereed]

     View Summary

    Motivation: Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures.
    Results: We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics.

    DOI PubMed J-GLOBAL

    Scopus

    187
    Citation
    (Scopus)
  • A Non-parametric Bayesian Approach for Predicting RNA Secondary Structures

    Kengo Sato, Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai, Yasubumi Sakakibara

    ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS   5724   286 - +  2009  [Refereed]

     View Summary

    Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models.

  • Large scale similarity search for locally stable secondary structures among RNA sequences

    Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai

    IPSJ Transactions on Bioinformatics   2   36 - 46  2009  [Refereed]

     View Summary

    Recently, a large number of candidates of non-coding RNAs (ncRNAs) has been predicted by experimental or computational approaches. Moreover, in genomic sequences, there are still many interesting regions whose functions are unknown (e.g., indel conserved regions, human accelerated regions, ultraconserved elements and transposon free regions) and some of those regions may be ncRNAs. On the other hand, it is known that many ncRNAs have characteristic secondary structures which are strongly related to their functions. Therefore, detecting clusters which have mutually similar secondary structures is important for revealing new ncRNA families. In this paper, we describe a novel method, called RNAclique, which is able to search for clusters containing mutually similar and locally stable secondary structures among a large number of unaligned RNA sequences. Our problem is formulated as a constraint quasiclique search problem, and we use an approximate combinatorial optimization method, called GRASP, for solving the problem. Several computational experiments show that our method is useful and scalable for detecting ncRNA families from large sequences. We also present two examples of large scale sequence analysis using RNAclique. © 2009 Information Processing Society of Japan.

    DOI CiNii

    Scopus

    1
    Citation
    (Scopus)
  • Software.ncrna.org: web servers for analyses of RNA sequences

    Kiyoshi Asai, Hisanori Kiryu, Michiaki Hamada, Yasuo Tabei, Kengo Sato, Hiroshi Matsui, Yasubumi Sakakibara, Goro Terai, Toutai Mituyama

    NUCLEIC ACIDS RESEARCH   36 ( Web Server issue ) W75 - W78  2008.07  [Refereed]

     View Summary

    We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.

    DOI PubMed J-GLOBAL

    Scopus

    5
    Citation
    (Scopus)
  • Mining frequent stem patterns from unaligned RNA sequences

    Michiaki Hamada, Koji Tsuda, Taku Kudo, Taishin Kin, Kiyoshi Asai

    BIOINFORMATICS   22 ( 20 ) 2480 - 2487  2006.10  [Refereed]

     View Summary

    Motivation: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly.
    Results: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder.

    DOI PubMed J-GLOBAL

    Scopus

    38
    Citation
    (Scopus)

▼display all

Books and Other Publications

▼display all

Presentations

  • RNAデータサイエンス

    浜田道昭  [Invited]

    一般社団法人ゲノムテクノロジー研究会第8回バイオインフォマティクス分科会「データサイエンス」 

    Presentation date: 2024.09

    Event date:
    2024.09
     
     
  • RNA創薬を加速する情報技術、[S41] 外来性RNAに対する防御機構解明が切り拓くRNA創薬のニューフロンティア

    浜田道昭  [Invited]

    日本薬学会第144年会 

    Presentation date: 2024.03

    Event date:
    2024.03
     
     
  • 核酸医薬研究を加速する情報技術の開発と応用

    浜田道昭  [Invited]

    第453回CBI学会講演会「中分子創薬を革新する計算科学、情報科学の最前線」 

    Presentation date: 2024.02

    Event date:
    2024.02
     
     
  • mRNAのトータルデザインに向けた情報技術

    浜田道昭  [Invited]

    日本核酸医薬学会第8回年会 mRNAシンポジウム 

    Presentation date: 2023.07

  • RNAバイオインフォマティクスを用いた核酸医薬研究

    浜田道昭  [Invited]

    日本核酸医薬学会第8回年会 教育セッション(生物) 

    Presentation date: 2023.07

  • AIアプタマー創薬 ―人工知能技術を用いたRNAアプタマー創薬の加速―

    浜田道昭  [Invited]

    日本コンピュータ化学会20周年記念シンポジウム 

    Presentation date: 2022.06

  • AI aptamer drug discovery project

    Michiaki Hamada  [Invited]

    Presentation date: 2022.03

  • GFlowNets による多様性制御生成モデルの学習

    三森隆広, 浜田道昭

    第27回情報論的学習理論ワークショップ(IBIS2024)、電子情報通信学会 情報論的学習理論と機械学習研究会 

    Event date:
    2024.11
     
     
  • ベイズ最適化によるCap2型mRNA生産の最適条件の探索

    木村祐太, 須賀幹太, 木村康明, 浜田道昭, 阿部洋

    第55回 中部化学関係学協会支部連合秋季大会 

    Event date:
    2024.11
     
     
  • Deep generative model for functional RNA design

    Shunsuke Sumi, Michiaki Hamada, Hirohide Saito  [Invited]

    BIOINFO 2024 

    Event date:
    2024.10
     
     
  • Information technology accelerating RNA therapeutics

    Michiaki Hamada  [Invited]

    RNA and Developmental Biology A Symposium Commemorating the End of the 12-Year RNA Medical Science Laboratory 

    Presentation date: 2024.10

  • タンパク質言語モデルを用いたコロナウイルスの宿主予測

    川崎千晶, 井内仁志, 浜田道昭

    生命情報科学若手の会 第16回年会 

    Event date:
    2024.09
     
     
  • トランスポゾンのサブファミリー分類手法の確立

    山本晃大, 小菅将斗, 武田淳志, 福永津嵩, 浜田道昭

    生命情報科学若手の会 第16回年会 

    Event date:
    2024.09
     
     
  • Prefix-tuned RNA言語モデルによる新規タンパク質結合RNA配列の生成

    横山源太朗, 浜田道昭

    生命情報科学若手の会 第16回年会 

    Event date:
    2024.09
     
     
  • 大規模ゲノムに適用可能なde novo散在反復配列検出手法の開発

    武田淳志, 福永津嵩, 浜田道昭

    生命情報科学若手の会 第16回年会 

    Event date:
    2024.09
     
     
  • トランスポゾンとKRAB-ZFPの進化的軍拡競争

    小菅将斗, 伊東潤平, 浜田道昭

    生命情報科学若手の会 第16回年会 

    Event date:
    2024.09
     
     
  • RNAバイオインフォマティクスを用いた生命医薬学研究

    浜田道昭  [Invited]

    公益財団法人ときわ会先端医学研究所セミナー 

    Presentation date: 2024.09

    Event date:
    2024.09
     
     
  • De novo interspersed repeat detection using inexact seed

    Atsushi Takeda, Daisuke Nonaka, Yuta Imazu, Tsukasa Fukunaga, Michiaki Hamada

    ISMB2024 

    Event date:
    2024.07
     
     
  • Identify the effect of R-lop on transcriptional regulatory mechanisms

    Ryotaro Yanoshita, Eito Ichihash, Mai Kubora, Chao Zeng, Michaki Hamada, Masayuki Sakurai

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • Exploring architectural RNAs associated to cellular senescence

    Saki Fujiwara, Naoko Fujiwara, Takeshi Chujo, Chao Zeng, Michiaki Hamada, Tetsuro Hirose

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • Comprehensive Database for RNA-Targeting Drug Discovery

    Chao Zeng, Michiaki Hamada

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • The MTR4/hRNPK complex-mediated degradation of aberrant polyadenylated RNAs with multiple exons

    Xinyue Gao, Kenzui Taniue, Anzu Sugawara, Chao Zeng, Han Han, Masahide Seki, Yutaka Suzuki, Michiaki Hamada, Nobuyoshi Akimitsu

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • A universal tool for characterization of RNA discovered by SELEX

    Shunsuke Sumi, Tatsuyuki Yoshii, Tatsuo Adachi, Hirohide Saito, Michiaki Hamada

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • Sequence characterization and prediction of semi-extractable RNAs

    Ryoma Yamawaki, Chao Zeng, Michiaki Hamada

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • Deciphering the relationship between 5'UTR and 3'UTR sequence of mRNA

    Kanta Suga, Michiaki Hamada

    第25回日本RNA学会 

    Event date:
    2024.06
     
     
  • Systematic discovery of regulatory motifs associated with human insulator sites

    Naoki Osato, Michiaki Hamada

    Human Genome Meeting 2024 

    Event date:
    2024.04
     
     
  • 大腸菌における翻訳促進新生ペプチドの網羅的探索

    加藤晃代, 西河佑馬, 中野秀雄, 横山源太朗, 浜田道昭, 本野千恵

    日本農芸化学会2024年度大会 

    Presentation date: 2024.03

  • Deep generative design of RNA family sequences

    Shunsuke Sumi, Michiaki Hamada, Hirohide Saito

    Winter Q-bio 2025 

    Event date:
    2024.02
     
     
  • バイオインフォマティクス:情報科学で生命・医学・薬学研究にブレイクスルーを

    浜田道昭  [Invited]

    早稲田大学校友会 稲門医師会 第8回総会 

    Presentation date: 2024.02

    Event date:
    2024.02
     
     
  • バイオインフォマティクス:情報科学で生命・医学・薬学研究にブレイクスルーを

    浜田道昭  [Invited]

    先進技術研究会 

    Presentation date: 2023.11

    Event date:
    2023.11
     
     
  • 情報科学を用いたmRNA・核酸医薬研究

    浜田道昭  [Invited]

    第8回 mRNA薬検討会 

    Presentation date: 2023.09

    Event date:
    2023.09
     
     
  • RNA構造予測ソフトウエアの紹介と比較

    浜田道昭, 栗崎以久男  [Invited]

    NPO法人mRNAターゲット創薬研究機構 2023年度 第1回講演会 

    Presentation date: 2023.06

  • バイオインフォマティクス:情報科学で生命・医学・薬学研究にブレイクスルーを

    浜田道昭  [Invited]

    千代田稲門会2023年度定時総会講演会 

    Presentation date: 2023.06

  • AI aptamer drug discovery, Special session invited talk

    Michiaki Hamada  [Invited]

    GIW / ISCB-Asia 2022 

    Presentation date: 2022.12

  • 情報科学を用いた核酸医薬・mRNA医薬研究

    浜田道昭  [Invited]

    第31回WAKO Web受託セミナー RNA合成の進展 

    Presentation date: 2022.11

  • AIアプタマー創薬プロジェクト

    浜田道昭  [Invited]

    2022年度CREST「バイオDX」領域キックオフシンポジウム 

    Presentation date: 2022.11

  • RNAバイオインフォマティクス研究の最前線

    浜田道昭  [Invited]

    千葉工業大学大学院 最先端生命科学特論 講演会 

    Presentation date: 2022.09

  • RNA 情報学を用いた医薬学研究

    浜田道昭  [Invited]

    特定非営利活動法人 mRNAターゲット創薬研究機構 2022年度第2回講演会 

    Presentation date: 2022.08

  • RNA情報学を基軸とした創薬基盤研究

    浜田道昭  [Invited]

    RNA情報学を基軸とした創薬基盤研究 

    Presentation date: 2022.05

  • RNA研究の最前線】RNA情報学を基軸とした生命科学・医薬学研究

    浜田道昭  [Invited]

    日本医科大学 講演会 

    Presentation date: 2022.02

  • RNAを基軸とした創薬研究

    浜田道昭  [Invited]

    EWE講演会 

    Presentation date: 2022.01

  • AIアプタマー創薬

    浜田道昭  [Invited]

    分子生物学会 

    Presentation date: 2021.12

  • ゲノム社会とバイオインフォマティクス

    浜田道昭  [Invited]

    日本バイオインフォマティクス学会・日本オミックス医学会 合同シンポジウム, IIBMP2021 

    Presentation date: 2021.09

  • AIアプタマー創薬プロジェクト

    浜田道昭  [Invited]

    日本医科大学・早稲田大学合同シンポジウム 

    Presentation date: 2021.06

  • RNA情報学の最前線

    浜田道昭  [Invited]

    生命情報科学勉強会@宮崎大学 

    Presentation date: 2021.05

  • RNAバイオインフォマティクスの最前線

    浜田道昭  [Invited]

    名古屋大学 特別講演 

    Presentation date: 2021.01

  • RNAを基軸とした創薬研究

    浜田道昭  [Invited]

    EWE講演会 

    Presentation date: 2021.01

  • 核酸医薬品開発に向けたバイオインフォマティクス技術

    浜田道昭  [Invited]

    第15回理研「バイオものづくり」シンポジウム 

    Presentation date: 2020.12

  • AIアプタマー創薬の実現に向けた情報技術

    浜田道昭  [Invited]

    NVIDIA GPU Technology Conference (GTC) 

    Event date:
    2020.10
     
     
  • AIアプタマー創薬プロジェクト

    浜田道昭  [Invited]

    CREST「人工知能」領域 第3回 成果展開シンポジウム 

    Presentation date: 2020.09

  • 長鎖ノンコーディングRNAの機能の解明に向けたバイオインフォマティクス

    浜田道昭  [Invited]

    ゲノム創薬・創発フォーラム 第 3 回シンポジウム (主要テーマ:RNA関連の基礎研究とその創薬応用)  (東京大学医科学研究所附属病院 A棟8階 トミーホール) 

    Presentation date: 2020.02

  • ⻑鎖ノンコーディングRNAの 機能の解明に向けた バイオインフォマティクス技術

    浜田道昭  [Invited]

    2019年度 RNAフロンティアミーティング  (IBM 天城ホームステッド) 

    Presentation date: 2019.09

  • RNAバイオインフォマティクス:技術開発と応用

    浜田道昭  [Invited]

    2019年度 第1回 核酸を標的とした低分子創薬研究会  (大阪大学 産業科学研究所) 

    Presentation date: 2019.08

  • Model Learning meets Biology ー生物データの背後に潜む「構造」を情報科学で明らかにするー

    浜田道昭  [Invited]

    第7回生命医薬情報学連合大会(IIBMP2018) 

    Event date:
    2018.09
     
     
  • 長鎖ノンコーディング RNA の機能の解明に向けた バイオインフォマティクス技術

    浜田道昭  [Invited]

    EWE 三月会 11 月例会  (日比谷市政会館) 

    Presentation date: 2017.11

  • 生命情報科学と私

    浜田道昭  [Invited]

    第9回生命情報科学若手の会  (西浦温泉ホテルたつき) 

    Presentation date: 2017.10

▼display all

Research Projects

  • Platform for Advanced Genome Science

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Transformative Research Areas (platforms for Advanced Technologies and Research Resources)

    Project Year :

    2022.04
    -
    2028.03
     

  • RNAリインカネーション

    日本学術振興会  科学研究費助成事業

    Project Year :

    2024.06
    -
    2027.03
     

    浜田 道昭, 秋光 信佳, 櫻井 雅之

  • The lncRNA landscape of skeletal muscle cell biotransformation with aging.

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2024.04
    -
    2026.03
     

  • Overview and Systematic Understanding of Biological Phase Separation Based on RNA-Centric Molecular Networks

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2023.04
    -
    2026.03
     

  • AIアプタマー創薬プロジェクト

    国立研究開発法人科学技術振興機構  戦略的創造研究推進事業(CREST)

    Project Year :

    2021.04
    -
    2024.03
     

    浜田 道昭

     View Summary

    低分子化合物に替わる次世代の新薬として注目されている「RNAアプタマー」の創薬期間を劇的に短縮するために、アプタマー創薬実験とRNA情報科学・人工知能技術を融合した「AIアプタマー創薬」を確立する。

  • リピート要素のde novo発見に基づく長鎖ノンコーディングRNAの機能の解明

    日本学術振興会  科学研究費助成事業 基盤研究(A)

    Project Year :

    2020.04
    -
    2023.03
     

    浜田 道昭, 小野口 真広, 福永 津嵩

  • 発達期ダイオキシンと老年期の高次認知機能低下の関係性解明

    日本学術振興会  科学研究費助成事業 基盤研究(A)

    Project Year :

    2019.04
    -
    2022.03
     

    掛山 正心, 浜田 道昭, 久保 健一郎, 皆川 栄子, 前川 文彦

     View Summary

    我々は動物実験により、ダイオキシン等の胎仔期曝露が認知機能を低下させることを認知課題成績と神経細胞の微細形態変化の双方で報告した。本研究では到達目標を、ダイオキシン等の発達期曝露が認知症の発症・増悪に関与する科学的知見を集積し、認知症の毒性エンドポイントとしての重要性を示すことにおく。(1)ダイオキシン等によって老年期に生じる認知的柔軟性の低下に焦点をあて、ヒト調査ならびに動物毒性実験により、影響の質と程度、そしてその毒性機構を明らかにして、(2)その成果をもとに、ヒト調査ならびに動物毒性実験において、高次認知機能の表現型解析技術を確立することを目的としている。本年度は、ヒト・コホート調査と動物毒性実験を実施するため、ヒト調査で用いる課題アプリを作成するとともに、コホート調査手続きを行った。タブレット端末での課題提示によるリモート評価を行う基盤整備も進めた。動物実験では認知的柔軟性と脳活動の定量評価を行うため、課題の作成と毒性試験の準備を行った。IntelliCageを用いた課題とともに、タッチスクリーンオペラント実験装置を用いた課題の確立も行なった。理化学研究所との共同研究により、アルツハイマー病モデルマウスを対象とした表現型解析を行い、認知症とメンタルスキーマの関係についての有望な知見を得た(論文投稿中)。また、本プロジェクトで取得するデータをモデリングするため、既存データのメタ解析を実施した。

  • Development of strategy to design anticancer drug based on ceRNA network

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2018.07
    -
    2021.03
     

    Nobuyoshi Akimitsu

     View Summary

    The post-transcriptional gene regulation is achieved by RNA-based gene regulatory networks. The RNA-based networks are classified by RNA-RNA and RNA-protein network. In this study, we investigate these RNA-based network to reveal gene regulation of cancer and we aimed development of new strategy to develop anticancer drugs. Our study has revealed that RNA-protein network is remodeled in response to chronic hypoxia and we found the target molecules of RNA-protein network under hypoxia.

  • RNA-クロマチン相互作用予測と応用

    日本学術振興会  科学研究費助成事業 挑戦的研究(萌芽)

    Project Year :

    2017.06
    -
    2021.03
     

    浜田 道昭, 岩切 淳一

     View Summary

    哺乳類ゲノムの大部分は,コーディングあるいはノンコーディングRNAを転写している.このうちノンコーディングRNAの一部は,クロマチンと相互作用を行い,エピジェネティックな制御を行っていることが示唆されている.RNAとクロマチン相互作用のメカニズムを解明するために,lncRNAとクロマチンの相互作用予測を行うモデルを構築し,構築したモデルからどのような特徴が相互作用い寄与しているかの検討を行った.今回考えた特徴としては下記のものである:R-loop形成,RNA:DNA triplex, RNA結合によるscafold.このうち,R-loop形成に関しては配列相補性をアラインメントにより同定することにより推定した.またこの際には,RNAアクセシビリティも考慮するようにした.RNA:DNA triplexに関しては,既存のtriplex予測ツールを利用した.機械学習モデルとしては,ランダムフォレストを主に利用した.これは,ランダムフォレストは,分類に寄与した特徴量の導出が容易に可能となるためである.実際のデータとしては,RNAクロマチン相互作用に関する大規模実験データを用いて,正例と負例を作成し,構築したモデルの学習を行った.予測精度の評価はクロスバリデーションを用いたが,現状十分な予測精度は出ていない.特徴量および学習データの両面から現在詳細に検討を行っている段階である.機械学習モデルに関しても深層学習なども含めて検討を行うことを計画している.

  • 人工知能技術を用いた革新的アプタマー創薬システムの開発

    JST  戦略的創造研究推進事業(CREST)

    Project Year :

    2018.10
    -
    2021.03
     

    浜田道昭

     View Summary

    本研究提案は,次世代新薬の要である『RNAアプタマー』の創薬のプロセスの劇的な短縮および成功率の向上を実現し,医薬品開発にブレイクスルーを起こすことを目的とします.そのために,アプタマー創薬プロセスの短鎖化までのステップを人工知能技術と核酸インフォマティクスにより自動化した『AIアプタマー創薬システム』の研究開発を行い,製薬企業のリボミックに導入しその汎用性・有効性を検証した後に公開します.

  • Platform for Advanced Genome Science

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2016
    -
    2021
     

    KOHARA Yuji, Kato Kazuto, Kawashima Minae, TOYODA Atsushi, Suzuki Yutaka, MITSUI Jun, Hayashi Tetsuya, TOKINO Takashi, Kurokawa Ken, Nakamura Yasukazu, Noguchi Hideki, IWASAKI WATARU, Morishita Shinichi, Asai Kiyoshi, Kasahara Masahiro, Ito Takehiko, Yamada Takuji, KUHARA Satoshi, Takahashi Hiroki, Sakakibara Yasubumi, HAMADA MICHIAKI, Takagi Toshihisa, SESE JUN, Ogura Yoshitoshi, Ida Ryuichi, YAMAGATA Zentaro, Masui Toru, Muto Kaori, Kodama Satoshi, Setoyama Koichi, Kokado Minori, Ohashi Noriko, FUJIYAMA Asao, INOUE Ituro, Nakaoka Hirofumi, Sugano Sumio, Tsuji Shoji, Gotoh Yasuhiro, Nakamura Keiji, Ogura Yoshitoshi, Okuno Miki, Nakase Hiroshi, SASAKI Yasushi, IDOGAWA Masashi, Tange Shoichiro, Mori Hiroshi, OGASAWARA Osamu, Tanizawa Yasuhiro, Kondo Shinji, kiryu hisanori, Kajitani Rei, TASHIRO Kosuke, Frith Martin, HIRAKAWA Hideki, Suzuki Hiromu, NOSHO KATSUHIKO, KAI Masahiro

     View Summary

    Our group has provided the state of art genome technologies, named PAGS Support, including de novo genome sequencing, variation analysis, epigenomics, RNA analysis, metagenome analysis and single cell analysis, to the projects that were selected from proposals based on KAKENHI projects. Thus far, we have provided PAGS Support to altogether 912 proposals that were selected from 1988 proposals. The proposals cover the most fields of life sciences, expanding to the fields of physical sciences, environmental studies and so on. Thus far 556 papers have been published as the outcome, which covers from biology to agriculture, medicine and pharmacy, from basic to applied sciences. Our group has also developed new technologies and algorithms to overcome the problems emerged in the PAGS Support, which are used in the other PAGS Support projects. This is a positive cycle and therefore our system becomes a very effective way for the promotion of biological sciences.

  • RNA-クロマチン相互作用予測と応用

    文部科学省  挑戦的研究(萌芽)

    Project Year :

    2017.03
    -
    2020.04
     

    浜田道昭

  • 機能エレメントと深層学習に基づく長鎖ノンコーディングRNAの機能分類

    日本学術振興会  科学研究費助成事業 若手研究(A)

    Project Year :

    2016.04
    -
    2020.03
     

    浜田 道昭

     View Summary

    ヒトなどの高等生物では,タンパク質に翻訳されずにRNAのまま機能を発揮する長鎖ノンコーディングRNA(lncRNA)が数多く存在していることが示唆されているがその大部分の機能は未解明である.lncRNAの機能エレメントを同定するための研究として,下記の研究を行った.
    - リボソーム結合lncRNAの同定と配列解析:網羅的実験データを用いて,リボソームRNAの結合するlncRNAの同定を行うと同時に配列特徴の抽出を行い,その生物学的意義について検討を行った.関連する論文を2報出版した(BMC Genomics. 2018 Dec 31;19(Suppl 10):906, BMC Genomics. 2018 May 29;19(1):414. doi: 10.1186/s12864-018-4765-z.)
    - ヒトとマウスの網羅的なlncRNA-RNA相互作用予測を可能とするWebサーバLncRRISearchを公開した(http://rtools.cbrc.jp/LncRRIsearch/)
    - リピートに結合するRBPの網羅的同定:我々の過去の研究で,lncRNAの組織特異的発現にリピート要素が関連していることを示したが,さらなる機能解析を進めるために,リピートに結合するlncRNAの同定を行った.現在結果を詳細に検討中であり,今年中に論文として出版することを計画している.
    - RNA-RNA相互作用ツールRIblastの高度化:p-valueの計算を行う方法の実装を行った.これにより,実験生物学者の利用が促進されることが期待される(J Comput Biol. 2018 Sep;25(9):976-986)

  • 機能エレメントと深層学習に基づく長鎖ノンコーディングRNAの機能分類

    文部科学省  若手研究(A)

    Project Year :

    2016.04
    -
    2020.03
     

    浜田道昭

  • Long non cording RNA associated with drug resistance in lung cancer with driver mutation

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

    Project Year :

    2016.04
    -
    2019.03
     

    SEIKE MASAHIRO, HAMADA Michiaki

     View Summary

    We tried to identify a long non cording RNA (lncRNA) associated with drug resistance to molecular targeted therapy in lung cancer with driver mutation. We analyzed lnc RNA expression profiles of 4 drug sensitive lung cancer cells and 10 drug reistant lung cancer cells showing cancer stem cell properties and epithelial- mesenchimal trasition using microarray and bioinformatic analysis. We identified CRNDE and IRX5 as lnc RNA and its targeted protein associated with drug resistance to molecular targeted therapy in lung cancer with driver mutation.Inhibition of IRX5 using siRNA showed apototic activity in drug resistant lung cancer cells. CRNDE and IRX5 may be promising targets to overcome the drug resistance to molecular targeted therapy in lung cancer with driver mutation.

  • RNA informatics for epi-transcriptome analysis

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2016.04
    -
    2019.03
     

    Asai Kiyoshi

     View Summary

    The energy parameters of the important modified bases, inosine and N6 methyladenosine were identified by a combination of thermometric experiments and molecular simulations. The effect of estimation error on structure prediction was evaluated and presented by theoretical analysis and computer experiments. A model of the effect of A-to-I editing on translational repression efficiency by miRNA was constructed and presented in a joint study using the identified inosine parameters.
    We have improved RintD, an analysis tool for secondary structure probability distribution, by developing RintW, which calculates the distribution of base pair probability, and RintC, which speeds up the calculation with maximum base pair constraint. At that time, the effect of the Fourier transform on the numerical error was analyzed using the accuracy guarantee calculation, and it was shown that the large probability was reliable.

  • ヒストンバリアントに基づくクロマチンの機能の推定

    日本学術振興会  科学研究費助成事業 新学術領域研究(研究領域提案型)

    Project Year :

    2016.04
    -
    2018.03
     

    浜田 道昭

     View Summary

    (1) ヒストンバリアントを含むクロマチンマークに対するクロマチン状態の推定.
    ヒストンバリアントのデータとしては,ヒト:Kujirai+, NAR (2016) 44, 6127-41,マウス:Maehara+, Epigenetics Chromatin (2015) 17;8:35を用いた.これらのデータを用いて,研究代表者が開発した手法を用いてクロマチン状態の推定を行った.さらに,推定されたクロマチン状態と,様々なゲノムアノテーションとの相関を調査した.
    (2)データベースlncRRIdb: 発現,局在情報を統合したlncRNA-RNA相互作用データベース
    本研究では,クロマチン機能を長鎖ノンコーディングRNA(lncRNA)の観点から特徴づけることを試みるために,lncRNAと相互作用を行うRNAの網羅的なデータベースの構築を行った.これは研究代表者らが開発したRIblastを用いて,計算機による網羅的な相互作用予測を行った結果を,発現および局在の実験情報とともに格納したデータベースである
    (3)階層的なクロマチン状態を推定するための情報技術の開発.
    プロモーターやエンハンサーも,階層的な構造を有していると考えた.例えば,promoter⇒strong promoter, weak promoter, bivalent promoterなどである.従来のクロマチン状態の推定手法においては,このような階層性を考えることはできなかったため,我々は独自に手法の開発を行った.そのためのプロトタイプシステムの開発を行い小さなデータを用いてその有効性を検証した.

  • ヒストンバリアントに基づくクロマチンの機能の推定

    文部科学省  新学術領域研究(研究領域提案型)

    Project Year :

    2016.04
    -
    2018.03
     

    浜田道昭

  • A population genetics analysis of RNA secondary structures

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)

    Project Year :

    2013.04
    -
    2017.03
     

    Kiryu Hisanori, ASAI Kiyoshi, HAMADA Michiaki, SATO Kengo, KATO Yuki, IWASAKI Wataru, ONO Yukiteru, TERAI Goro, OZAKI Haruka, MATSUMOTO Hirotaka, FUKUNAGA Tsukasa, MORI Ryota, KASHIHARA Yuki, KAWAGUCHI Risa

     View Summary

    RNA molecules in a cell play very important roles so that genetic information encoded in the genome is instantiated as proteins and exerts actual functions. Three dimensional structure of an RNA is understood by its arrangement of stem structures, and it is important to investigate the properties of this secondary structure to understand the functions of RNAs. In this study, we have succeeded in developing an algorithm (ParasoR) for computing several properties of RNA secondary structures of very long RNAs such as messenger RNAs and long non-coding RNAs for the first time. We have also succeeded in developing an algorithm (CapR) for computing secondary structural context around the binding regions of RNA binding proteins.

  • Comprehensive prediction of RNA-protein interactinos

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2013.04
    -
    2016.03
     

    Asai Kiyoshi, YURA Kei

     View Summary

    The aim of the research was to predict the RNA-protein interactions for non-coding RNAs and function-known proteins. Our analysis of RNA-protein complex in PDB showed that the nucleotides that do not form base-pairs in RNA 2D structures but form hydrogen bond with amino acids have lower base-pairing probabilities than the nucleotides that form neither base-pairs or hydrogen bonds with amino acids. We developed a new method to understand the landscape of the distribution of RNA 2D structures, by efficiently calculating the probabilities of all the structures with specific Hamming distances from the canonical structures. In order to predict the joint structure of RNA-protein complex, we performed rigid body docking simulations. After revising the force field for RNAs, our docking simulations showed better accuracy than previous methods, and we reported that in a peer reviewed journal.

  • Development of basic technology for privacy-preserving bioinformatics and its application

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Project Year :

    2013.04
    -
    2016.03
     

    Hamada Michiaki, Shimizu Kana, Hanaoka Goichiro, Tsuda Koji, Frith Martin, Asai Kiyoshi

     View Summary

    It is highly demanded to deal with the information of personal genome and chemical compound secretly, because they are sensitive information that should not be leaked. On the other hand, from a viewpoint of "open" science, it is important to perform data-mining by combining those sensitive information with other data. In this study, we have developed several methods to perform data-mining, making those information secret. Specifically, we developed (i) privacy-preserving search for chemical database, (ii) privacy-preserving genome sequence search with hidden Markov Model (HMM) and (iii) privacy preserving sequence alignment, all of which will be useful toward open science of biology.

  • Research on structure predictions of RNA with modified nucleotieds

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (A)

    Project Year :

    2012.04
    -
    2016.03
     

    Hamada Michiaki

     View Summary

    We have developed bioinformatic methods for predicting secondary structures including modified bases. Due to the limitation of the known structures with modified bases, we employed a semi-supervised learning approach for predicting RNA secondary structures using RNA sequences with and without secondary structures. Moreover, we have developed an integrated web server, Rtools, for performing various analyses based on RNA secondary structures.

  • Platform of large scale and high quality genomics and bioinformatics: Towards the advancement of genome sciences in academia

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)

    Project Year :

    2010.04
    -
    2016.03
     

    KOHARA Yuji, KATO Kazuto, TOYODA Atsushi, KUROKI Yoko, SUGANO Sumio, SUZUKI Yutaka, HAYASHI Tetsuya, YAMAMOTO Ken, TSUJI Shoji, INOUE Ituro, KUROKAWA Ken, MORISHITA Shinichi, NAKAMURA Yasukazu, TABATA Satoshi, KUHARA Satoshi, IWASAKI Wataru, SESE Jun, TAKAHASHI Hiroki, ASAI Kiyoshi, KASAHARA Masahiro, SAKAKIBARA Yasubumi, YADA Tetsushi, YAMAGATA Zentaro, MUTO Kaori, IDA Ryuichi, MASUI Tohru, KURIYAMA Mariko, TAKAGI Toshihisa, FUJIYAMA Asao, HATTORI Masahira, OGURA Yoshitoshi, TOKUNAGA Katsushi, KUWANO Ryozo, OHASHI Jun, ITOH Takehiko, HIRAKAWA Hideki, NOGUCHI Hideki, MATSUOKA Satoshi, OGASAWARA Naotake, NAKAMURA Kensuke, HAMADA Michiaki, KANAYA Shigehiko, ANZAI Yuichiro, OKADA Kiyotaka, SAKAKI Yoshiyuki, TAKAKU Fumimaro, TOYOSHIMA Kumao, NAKAMURA Keiko, HOTTA Yoshiki, YONEZAWA Akinori, YOSHIKAWA Hiroshi, YOSHIDA Mitsuaki, INOKO Hidetoshi, TODA Tatsushi, INAZAWA Johji, GOJOBORI Takashi, URUSHIHARA Hideko, TAKEDA Hiroyuki, SHIROISHI Toshihiko, ITOH Takashi, SATOH Noriyuki, MATSUDA Hideo, GOTO Susumu, TSUDA Masataka

     View Summary

    We have provided technologies of large scale and high quality genomics and bioinformatics to many KAKENHI projects, 60 to 90 subjects every year and altogether 464 subjects, based on application and selection. This kind of support became possible by concentrating to a limited number of DNA sequencing centers under the situation that there was unexpectedly fast advancement of these technologies in the world. Our activity has led to 363 papers including the Coelacanth genome paper. The KAKENHI subjects that we supported cover all the KAKENHI items and almost divisions of life science domain. Furthermore, we have developed new methodologies to solve the problems that emerged from the support activity : One of them is the genome assembly software PLATANUS that has become a key method to decipher difficult genomes. Such a virtuous circle and the outcome show that the platform is essential and effective in life sciences.

▼display all

Misc

  • Fast RNA-RNA Interaction Prediction Methods for Interaction Analysis of Transcriptome-Scale Large Datasets

    Tsukasa Fukunaga, Michiaki Hamada

    Methods in molecular biology (Clifton, N.J.)   2586   163 - 173  2023  [International journal]

     View Summary

    The computational prediction of RNA-RNA interactions has long been studied in RNA informatics. Most of the existing approaches focused on the interaction prediction of short RNAs in small datasets. However, in recent years, two fast prediction methods, RIsearch2 and RIblast, have been developed to predict transcriptome-scale interactions or long RNA interactions. The key idea of the software acceleration of these tools was the integration of a seed-and-extend method, which is used in fast sequence alignment tools, into RNA-RNA interaction prediction. As a result, the two software programs were ten to a thousand times faster than the existing tools; because of this acceleration, detection of genome-wide microRNA target sites or interaction partners of function-unknown long noncoding RNAs has become possible. In this review, we describe the basic concept of the algorithm, its applications, and the future perspectives of the fast RNA-RNA interaction prediction tools.

    DOI PubMed

  • ドライバー遺伝子異常肺癌の薬剤耐性機序における長鎖ノンコーディングRNAの意義

    高橋 聡, 野呂 林太郎, 吉川 明子, 中道 真仁, 菅野 哲平, 松本 優, 武内 進, 平尾 真季子, 松田 久仁子, Zeng Chao, 浜田 道昭, 久保田 馨, 清家 正博, 弦間 昭彦

    日本呼吸器学会誌   9 ( 増刊 ) 177 - 177  2020.08

  • ドライバー遺伝子異常肺癌の薬剤耐性機序における長鎖ノンコーディングRNAの意義

    高橋 聡, 野呂 林太郎, 吉川 明子, 中道 真仁, 菅野 哲平, 松本 優, 武内 進, 平尾 真季子, 松田 久仁子, Zeng Chao, 浜田 道昭, 久保田 馨, 清家 正博, 弦間 昭彦

    日本呼吸器学会誌   9 ( 増刊 ) 177 - 177  2020.08

  • CAFs induce formation of metastatic human breast tumor cell clusters with partial epithelial-mesenchymal transition

    Akira Orimo, Yasuhiko Ito, Yoshihiro Mezawa, Kaidiliavi Sulidan, Yataro Daigo, Nadila Wali, Okio Hino, Kazuyoshi Takeda, Michiaki Hamada, Yuko Matsumura

    CANCER SCIENCE   109   797 - 797  2018.12  [Refereed]

    Research paper, summary (international conference)  

  • 非コードRNA Eleanorはヌクレオソーム中のヒストンの交換を促進する

    藤田 理紗, 有村 泰宏, 山本 達郎, 浜田 道昭, 斉藤 典子, 胡桃坂 仁志

    生命科学系学会合同年次大会   2017年度   [3PT18 - 0555)]  2017.12

  • トピックモデルを用いたがんゲノムの変異シグネチャー解析 (ニューロコンピューティング)

    松谷 太郎, 宇恵野 雄貴, 福永 津嵩, 浜田 道昭

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 109 ) 159 - 164  2017.06

    CiNii

  • トピックモデルを用いたがんゲノムの変異シグネチャー解析 (情報論的学習理論と機械学習)

    松谷 太郎, 宇恵野 雄貴, 福永 津嵩, 浜田 道昭

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 110 ) 105 - 110  2017.06

    CiNii

  • Privacy-preserving search for chemical compound databases

    Shimizu K, Nuida K, Arai H, Mitsunari S, Attrapadung N, Hamada M, Tsuda K, Hirokawa T, Sakuma J, Hanaoka G, Asai K

    bioRxiv   ( 013995 )  2015.01

    Internal/External technical report, pre-print, etc.  

    DOI

  • RNA secondary structure prediction from multi-aligned sequences

    Michiaki Hamada

       2013.07

    Internal/External technical report, pre-print, etc.  

     View Summary

    It has been well accepted that the RNA secondary structures of most<br />
    functional non-coding RNAs (ncRNAs) are closely related to their functions and<br />
    are conserved during evolution. Hence, prediction of conserved secondary<br />
    structures from evolutionarily related sequences is one important task in RNA<br />
    bioinformatics; the methods are useful not only to further functional analyses<br />
    of ncRNAs but also to improve the accuracy of secondary structure predictions<br />
    and to find novel functional RNAs from the genome. In this review, I focus on<br />
    common secondary structure prediction from a given aligned RNA s...

  • Generalized Centroid Estimators in Bioinformatics

    Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

    PLoS ONE 6(2):e16450, 2011    2013.05

    Internal/External technical report, pre-print, etc.  

     View Summary

    In a number of estimation problems in bioinformatics, accuracy measures of<br />
    the target problem are usually given, and it is important to design estimators<br />
    that are suitable to those accuracy measures. However, there is often a<br />
    discrepancy between an employed estimator and a given accuracy measure of the<br />
    problem. In this study, we introduce a general class of efficient estimators<br />
    for estimation problems on high-dimensional binary spaces, which representmany<br />
    fundamental problems in bioinformatics. Theoretical analysis reveals that the<br />
    proposed estimators generally fit with commonly-used accura...

    DOI

  • Fighting against uncertainty: An essential issue in bioinformatics

    Michiaki Hamada

       2013.05

    Internal/External technical report, pre-print, etc.  

     View Summary

    Many bioinformatics problems, such as sequence alignment, gene prediction,<br />
    phylogenetic tree estimation and RNA secondary structure prediction, are often<br />
    affected by the &quot;uncertainty&quot; of a solution; that is, the probability of the<br />
    solution is extremely small. This situation arises for estimation problems on<br />
    high-dimensional discrete spaces in which the number of possible discrete<br />
    solutions is immense. In the analysis of biological data or the development of<br />
    prediction algorithms, this uncertainty should be handled carefully and<br />
    appropriately. In this review, I will explain several methods t...

  • 加法準同型暗号を用いた化合物データベースの秘匿検索プロトコル

    縫田光司, 清水佳奈, 荒井ひろみ, 浜田道昭, 津田宏治, 広川貴次, 花岡悟一郎, 佐久間淳, 浅井潔

    情報処理学会シンポジウムシリーズ(CD-ROM)   2012 ( 3 ) ROMBUNNO.2C2-1 - 389  2012.10

    CiNii J-GLOBAL

  • 半教師あり学習を用いたRNA二次構造予測アルゴリズムの提案

    米本悠, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本RNA学会年会要旨集   14th   160  2012.07

    J-GLOBAL

  • カノニカル分布に基づいたRNA二次構造安定性解析手法の開発

    森遼太, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本RNA学会年会要旨集   14th   154  2012.07

    J-GLOBAL

  • 検索行動におけるプライバシ保護

    荒井ひろみ, 清水佳奈, 浜田道昭, 津田宏治, 広川貴次, 佐久間淳, 浅井潔, 浅井潔

    人工知能学会全国大会論文集(CD-ROM)   26th   ROMBUNNO.3I2-OS-20-1  2012

    J-GLOBAL

  • カノニカル分布に基づくRNA二次構造の存在確率分布記述手法の開発

    森遼太, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本分子生物学会年会プログラム・要旨集(Web)   35th   WEB ONLY 1P-0244  2012

    J-GLOBAL

  • 半教師あり学習を用いたRNA二次構造予測アルゴリズムの提案

    米本悠, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔

    日本分子生物学会年会プログラム・要旨集(Web)   35th   WEB ONLY 3P-0071  2012

    J-GLOBAL

  • Maximizing Expected Accuracy in Bioinformatics(Industrial Materials)

    Hamada Michiaki, Asai Kiyoshi

    Bulletin of the Japan Society for Industrial and applied Mathematics   21 ( 1 ) 34 - 39  2011.03

    DOI CiNii

  • 期待精度最大化とバイオインフォマティクス

    浜田道昭, 浅井潔

    応用数理   21 ( 1 ) 34 - 39  2011.03

    DOI CiNii J-GLOBAL

  • RNA-RNA interaction prediction using integer programming with threshold cut (ニューロコンピューティング)

    Kato Yuki, Sato Kengo, Hamada Michiaki

    電子情報通信学会技術研究報告   110 ( 83 ) 183 - 190  2010.06

    CiNii

  • RNA-RNA Interaction Prediction Using Integer Programming with Threshold Cut

    KATO YUKI, SATO KENGO, HAMADA MICHIAKI, WATANABE YOSHIHIDE, ASAI KIYOSHI, AKUTSU TATSUYA

      2010 ( 32 ) 1 - 8  2010.06

    CiNii

  • CentroidFold:RNA二次構造予測ウェブサーバー

    佐藤健吾, 佐藤健吾, 浜田道昭, 浜田道昭, 浅井潔, 浅井潔, 光山統泰

    日本RNA学会年会要旨集   11th   96  2009.07

    J-GLOBAL

  • CentroidHomfold:相同配列群の情報を利用したRNAの2次構造予測

    浜田道昭, 浜田道昭, 佐藤健吾, 佐藤健吾, 木立尚孝, 木立尚孝, 光山統泰, 浅井潔, 浅井潔

    日本分子生物学会年会講演要旨集   32nd ( Vol.1 ) 48  2009

    J-GLOBAL

  • 期待精度を最大化するRNA情報解析手法の開発

    浜田道昭, 浜田道昭, 木立尚孝, 佐藤健吾, 佐藤健吾, 光山統泰, 浅井潔, 浅井潔

    生化学     2P-0776  2008

    J-GLOBAL

  • Support Vector Machineを用いた機能性RNAファミリーの分類

    浜田道昭, 浜田道昭, 浜田道昭, 加藤毅, 加藤毅, 金大真, 津田宏治, 浅井潔, 浅井潔

    RNAミーティング   7th   69  2005

    J-GLOBAL

  • A High Performance Computing Environments for Prediction of Activity and function of Biomolecules : An Application to Analysis of HIV Protease Inhibitors

    Hamada Michiaki, Feng Cheng, Inagaki Yuichiro, Nagashima Umpei, Murakami Kazuaki, Chuman Hiroshi

    Transactions of the Japan Society for Industrial and Applied Mathematics   14 ( 4 ) 267 - 288  2004.12

     View Summary

    We have developed an object oriented large-scale scientific simulations system that contains algorithms of molecular scientific computing programs, called Embedded High-Performance Computing (EHPC). As an application of the system, &quot;EHPC-Drug platform&quot; has been constructed for rational drug design. It can provide a high-performance computing ability for exhaustive conformational analyses of biomolecules, generating computation of their three-dimensional topological descriptors, and docking calculations with their target receptors. To enhance its computing abilities, we are also planning to ...

    DOI CiNii

  • A High Performance Computing Environments for Prediction of Activity and Function of Biomolecules:-An Application to Analysis of HIV Protease Inhibitors

    HAMADA MICHIAKI, FENG C, INAGAKI YUICHIRO, NAGASHIMA UMPEI, MURAKAMI KAZUAKI, CHUMAN HIROSHI

    日本応用数理学会論文誌   14 ( 4 ) 267 - 288  2004.12

     View Summary

    We have developed an object oriented large-scale scientific simulations system that contains algorithms of molecular scientific computing programs, called Embedded High-Performance Computing (EHPC). As an application of the system, "EHPC-Drug platform" has been constructed for rational drug design. It can provide a high-performance computing ability for exhaustive conformational analyses of biomolecules, generating computation of their three-dimensional topological descriptors, and docking calculations with their target receptors. To enhance its computing abilities, we are also planning to apply Grid computing technology to this system for parallel and distributed computing and Grid Data processing. As a critical test of our approach, we applied it to a prediction of bound conformation of several HIV protease inhibitors with the protease.

    DOI CiNii J-GLOBAL

  • Development and application of a platform for drug discovery using grid technology and XML database

    HAMADA MICHIAKI, INAGAKI YUICHIRO, CHUMAN HIROSHI

    構造活性相関シンポジウム講演要旨集   32nd   141 - 144  2004.11

    J-GLOBAL

  • 薬師(Xsi)―創薬のための仮想スクリーニング統合システムの開発

    稲垣祐一郎, 浜田道昭, 山崎一人, 金岡昌治, 中馬寛

    情報計算化学生物学会大会予稿集   2004   205 - 206  2004.07

    J-GLOBAL

  • DrugMLとGrid創薬

    浜田道昭, 稲垣祐一郎, 中馬寛

    日本コンピュータ化学会年会講演予稿集   2004   51  2004.05

    J-GLOBAL

  • Drug Discovery Using Grid Technologies and DrugML.

    HAMADA MICHIAKI, INAGAKI YUICHIRO, CHUMAN HIROSHI

    構造活性相関シンポジウム講演要旨集   31st   101 - 102  2003.11

    J-GLOBAL

▼display all

Industrial Property Rights

 

Syllabus

▼display all

 

Sub-affiliation

  • Faculty of Science and Engineering   Graduate School of Advanced Science and Engineering

  • Affiliated organization   Global Education Center

Research Institute

  • 2023
    -
    2024

    Center for Data Science   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Center for a Carbon Neutral Society   Concurrent Researcher

Internal Special Research Projects

  • シミュレーション技術を用いたRNA構造解析技術の開発

    2023  

     View Summary

    シミュレーション技術を用いたRNA構造解析技術として以下の研究開発を進めた.1.深層学習技術を用いてRNAとタンパク質の複合体立体構造を予測するための技術の開発を行った.現在論文執筆中である.2.分子動力学法などから得られる複数の立体構造情報を低次元の潜在空間に射影するための深層学習技術の開発を行った.現在論文執筆中である.

  • RNAリンカネーション

    2022  

     View Summary

    RNAリンカネーションの解明に寄与することが期待されるRNAバイオインフォマティクスの技術の開発を行った.例えば以下はRNAに共通する構造を高速に発見することを可能とするツールである.Tsukasa Fukunaga*, Michiaki Hamada, LinAliFold and CentroidLinAliFold: Fast RNA consensus secondary structure prediction for aligned sequences using beam search methods, Bioinformatics Advances, vbac078, https://doi.org/10.1093/bioadv/vbac078 Published: 22 October 2022また,予備実験も継続的に進めている.

  • 長鎖ノンコーディングRNA情報解析基盤の開発

    2021  

     View Summary

    長鎖ノンコーディングRNA(lncRNA)は生体内で単独で機能を発揮しているわけではなく,他の機能性分子と相互作用を行うことにより様々な機能を実現している.今年度はlncRNAと相互作用するRNA結合タンパク質(RBP)を情報科学的に解析するための研究を複数行った.第一に,RBPに結合するRNA配列をBERTの事前学習モデルを用いて予測するRBP-BERTを開発した.さらに学習された結果を解析することによりRBP結合の生物学的な特徴を抽出した.第二に,トランスポゾンなどのリピート要素に結合するRBPの網羅的な解析を行った.これにより,リピート要素がRBP結合の機能性配列となっていることが明らかになった.

  • ノンコーディングRNA解析情報基盤技術の研究

    2020  

     View Summary

    ヒトなどの高等真核生物で多数発見されている長鎖ノンコーディングRNAの機能を解明するために,基盤情報技術を構築し様々なバイオインフォマティクスの解析を行った.具体的には以下を行った.・局在と選択的スプライシングの関連性に関する網羅的解析・トランスクリプトームなm6A修飾の測定データから,高精度にm6A修飾位置を同定するためのツールMoAIMSの開発・ゲノムワイドなR-loop構造の同定と,その特徴の抽出

  • 秘密分散手法を用いた生命情報秘匿解析手法の研究

    2019  

     View Summary

    秘密分散法を用いて,アフィンギャップを用いた配列比較手法を安全に行うための手法の考案および実装を行った.既存手法との比較を行い,既存手法に比べて計算速度が大幅に改善することが確かめられた.[1] 深見 匠、浜田 道昭, アフィンギャップを考慮した安全な個人ゲノム比較, 2019/12/3, 第42回日本分子生物学会年会, 福岡国際会議場・マリンメッセ福岡[2] 深見匠, 浜田道昭, セキュアな個人ゲノム類似度計算, 2019年 暗号と情報セキュリティシンポジウム,2019年1月22日〜25日,びわ湖大津プリンスホテル

  • 統合オミックスデータ駆動生物学の数理情報基盤と実践

    2018  

     View Summary

    長鎖ノンコーディングRNAの機能の解明に向けたバイオインフォマティクス技術として,深層学習技術を用いた,m6A修飾の予測アルゴリズム/ツールの開発を行った.また,RNA-RNA相互作用を,配列情報のみを入力とし高速・高精度によろ即するためのアルゴリズムの開発を行った.さらに,モデル選択技術を用いたがんゲノムデータの変異シグネチャーの予測を行う基盤情報技術の開発を行った.

  • 統合オミックスデータ駆動生物学の数理情報基盤

    2016  

     View Summary

    様々なオミックスデータを情報解析するための方法として以下の研究成果を得た・メタゲノムデータを確率的にモデリングするための確率モデルの開発を行った.この確率モデルにおいては,自然言語分野で用いられるLDAを,メタゲノムデータに応用することにより,細菌群が推定することが可能となる.推定された細菌群と広く知られているエンテロタイプとの関連性を詳細に調べることにより,細菌群の生物学的意味付けを与えた.・シークエンシングデータから植物ゲノムの変異を同定するためのパイプラインを構築した.構築したパイプラインを用いて,植物の変異体(ミュータント)の解析を詳細に行った.本研究は,理化学研究所との共同研究である.・タンパク質やDNA配列のモチーフの確率モデルであるプロファイルHMMを,暗号技術を用いることにより,モデル情報およびクエリの情報を秘匿したまま検索を行う手法の開発を行った.本手法では,加法準同型暗号を用いることにより,足し算が暗号化したまま可能となることが本質的に用いられている.

  • RNA-クロマチン相互作用予測と応用

    2016  

     View Summary

    RNAとクロマチンの相互作用を配列情報のみから推定するための手法の開発に向けた以下の研究成果を得た.1. RNAとタンパク質の複合体構造を予測(ドッキング)を行うための新規手法を開発した.この手法の中では,分子動力学シミュレーションの結果を,複合体構造の評価関数に組み入れることによって,既存の手法に比べて大幅な精度の向上が実現された2. RNAの構造予測のための統合WebサーバRtoolsを構築し,公開をした.このウェブサーバーを用いることにより,RNAの配列情報のみから,構造に関する様々な予測情報(2次構造,塩基対確率行列,ステム,バルジ,ループなどの形成確率等)を得ることが可能となる.このような情報はRNA-クロマチン相互作用を予測する際にも有用となる

  • lncRNA-RNA相互作用の網羅的予測と実験情報を統合したデータベースの構築

    2015  

     View Summary

    本研究では、第一に、高速にRNA-RNAの相互作用を予測するためのパイプラインシステムを構築した。さらに、パイプラインシステムを京コンピュータに実装した。第2に、このパイプラインを用いてヒトのlncRNAを対象に網羅的な相互作用相手の予測を行い、得られた結果をデータベースとして公開を行った。APBC2016において、浜田が口頭発表を行うと同時に、ジャーナル論文(BMC Genomics)に論文が掲載された。

  • エピゲノムの統合的理解に向けた情報技術の開発とデータ駆動型生物学の実践

    2015  

     View Summary

    今年度は、昨年度発表した論文[1]のプログラムの、ソースコードの一般公開に向けて、プログラムの整理、および、改良を行った。具体的には、各位置においてクロマチン状態の事後確率が出力可能となるように変更を行った。[1] Michiaki Hamada*, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai, Learning chromatin states with factorized information criteria, Bioinformatics, Bioinformatics (2015) doi: 10.1093/bioinformatics/btv163 First published online: March 24, 2015

  • エピジェネティクスデータからクロマチン状態を推定する方法論の研究と応用

    2014  

     View Summary

    Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized.&nbsp;Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

▼display all