研究者詳細 - 清水　佳奈

写真a

シミズ　カナ

清水　佳奈

Scopus 論文情報

論文数: 34 Citation: 812 h-index: 13

Click to view the Scopus page. The data was downloaded from Scopus API in July 15, 2026, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar 情報（Citations per year）

Citation: 1311 h-index: 18 i10-index: 22

Click to view the Google Scholar page.

Scopus 情報

所属

理工学術院基幹理工学部

職名

教授

学位

博士（工学） ( 早稲田大学 )

ホームページ

https://iskana.github.io/works.html

経歴

2018年04月

-

　

早稲田大学理工学術院教授
2016年04月

-

2018年03月

早稲田大学理工学術院准教授
2013年03月

-

2016年03月

産業技術総合研究所創薬基盤研究部門/ゲノム情報研究センター/生命情報工学研究センター主任研究員
2013年12月

-

2015年04月

The Sloan-Kettering Institute at Memorial Sloan-Kettering Cancer Center Visiting Investigator
2009年04月

-

2013年02月

産業技術総合研究所生命情報工学研究センター研究員
2006年11月

-

2009年03月

産業技術総合研究所生命情報科学研究センター/生命情報工学研究センター産総研特別研究員
2006年01月

-

2006年10月

産業技術総合研究所生命情報科学研究センターテクニカルスタッフ

▼全件表示

学歴

2003年04月

-

2006年03月

早稲田大学理工学研究科情報ネットワーク専攻
2001年04月

-

2003年03月

早稲田大学理工学研究科, 情報科学専攻
1997年04月

-

2001年03月

早稲田大学理工学部情報学科

委員歴

2026年04月

-

継続中

日本バイオインフォマティクス学会理事
2025年04月

-

継続中

日本バイオインフォマティクス学会幹事
2023年10月

-

継続中

国立研究開発法人科学技術振興機構創発的研究支援事業アドバイザー
2023年01月

-

継続中

国立研究開発法人日本医療研究開発機構（AMED）課題評価委員
2021年

-

継続中

（JST）さきがけ「社会変革に向けたICT基盤強化」領域アドバイザ
2021年

-

継続中

International Society for Computational Biology (ISCB) EDI Committee Committee member
2020年07月

-

継続中

（JST）NBDC「統合化推進プログラム」研究アドバイザ
2020年

-

継続中

東京高等裁判所専門委員
2023年04月

-

2026年

日本学術振興会産学協力委員会 R053設計・計測・解析の協調プラットフォーム委員会
2021年04月

-

2025年03月

日本バイオインフォマティクス学会理事
2025年

　

　

34th International Conference on Genome Informatics (GIW) and the 8th ISCB-Asia Conference PROGRAMME COMMITTEE
2024年

-

2025年

Intelligent Systems for Molecular Biology (ISMB) 2024, 2025 Program Committee Area Chair
2024年

　

　

35th Annual Symposium on Combinatorial Pattern Matching Program Committee
2024年

　

　

Asia&Pacific Bioinformatics Joint Conference 2024 Technical Program Committee
2022年

　

　

GIW/ISCB Asia 2022 Program Committee
2019年06月

-

2021年06月

情報処理学会理事
2018年

-

2019年

The International Conference on Genome Informatics (GIW) 2018, 2019 Program Committee
2018年

　

　

Asia Pacific Bioinformatics Conference (APBC) 2018 Program Committee
2017年

-

2018年

日本バイオインフォマティクス学会監事
2016年

-

2018年

The Workshop on Algorithms in Bioinformatics (WABI) 2016, 2017, 2018 Program Committee
2011年

-

2018年

International Society for Computational Biology (ISCB) The affiliates committee
2010年

-

2011年

日本バイオインフォマティクス学会幹事
2010年

-

2011年

日本バイオインフォマティクス学会理事

▼全件表示

研究分野

生命、健康、医療情報学

研究キーワード

アルゴリズム
プライバシ保護データマイニング
データマイニング
ゲノム情報解析
生命情報科学
バイオインフォマティクス

▼全件表示

受賞

KDDI Foundation Award2022, 業績賞

2022年09月公益財団法人KDDI財団 Society5.0時代の生命科学を支えるプライバシ保護技術の研究
2021年日本バイオインフォマティクス学会年会・第10回生命医薬情報学連合大会（IIBMP2021）優秀ポスター賞

2021年
コンピュータセキュリティシンポジウム2019（CSS2019）奨励賞

2019年
平成30年度科学技術分野の文部科学大臣表彰科学技術賞（研究部門）

2018年04月
生命医薬情報学連合大会2016年大会研究奨励賞

2016年10月
平成27年度産総研理事長賞（研究）

2016年04月
生命医薬情報学連合大会2015年大会研究奨励賞

2015年10月
生命医薬情報学連合大会2015年大会最優秀口頭発表賞

2015年10月
コンピュータセキュリティシンポジウム2014（CSS2014）優秀デモンストレーション賞

2014年10月
コンピュータセキュリティシンポジウム2013（CSS2013）優秀デモンストレーション賞

2013年10月
生命医薬情報学連合大会2012年大会ベストポスター賞

2012年10月

▼全件表示

論文

Accurate SPARQL Generation via In-Context Learning and Schema-based Query Construction

Hikaru Nagazumi, Yuki Moriya, Shuichi Kawashima, Toshiaki Katayama, Kana Shimizu

Bioinformatics 2026年04月 [査読有り]

DOI

Scopus
Secure Full-Text Search Using Function Secret Sharing.

Tomoki Uchiyama, Kana Shimizu

Proceedings of the 23rd Workshop on Privacy in the Electronic Society(WPES@CCS) 59 - 72 2024年 [査読有り]

DOI

Scopus
Efficient Privacy Preserving Range Query Using Segment Tree.

Shusuke Shirotake, Kana Shimizu

58th Annual Conference on Information Sciences and Systems(CISS) 1 - 6 2024年 [査読有り]

DOI

Scopus

2

被引用数

(Scopus)
Enhancing Privacy Protection for Human Genome Synthesis Using Gradient Clipping.

Kohei Hashimoto, Kana Shimizu

BCB 37 - 6 2024年 [査読有り]

DOI

Scopus
Efficient Colored de Bruijn Graph for Indexing Reads.

Nozomi Hasegawa, Kana Shimizu

J. Comput. Biol. 30 ( 6 ) 648 - 662 2023年05月 [査読有り] [国際誌]

　概要を見る

The colored de Bruijn graph is a variation of the de Bruijn graph that has recently been utilized for indexing sequencing reads. Although state-of-the-art methods have achieved small index sizes, they produce many read-incoherent paths that tend to cover the same regions in the source genome sequence. To solve this problem, we propose an accurate coloring method that can reduce the generation of read-incoherent paths by utilizing different colors for a single read depending on the position in the read, which reduces ambiguous coloring in cases where a node has two successors, and both of the successors have the same color. To avoid having to memorize the order of the colors, we utilize a hash function to generate and reproduce the series of colors from the initial color and then apply a Bloom filter for storing the colors to reduce the index size. Experimental results using simulated data and real data demonstrate that our method reduces the occurrence of read-incoherent paths from 149,556 to only 2 and 5596 to 0 respectively. Moreover, the depths of coverage for the reconstructed reads are equal to those for the input reads for the simulated data, whereas the previous method decreases the depth of coverage at many positions in the source genome. Our method achieves quite a high accuracy with a comparable construction time, peak memory size, and index size to the previous method.

DOI PubMed

Scopus
Efficient privacy-preserving variable-length substring match for genome sequence. (Extended version)

Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu

Algorithms for molecular biology : AMB 17 ( 1 ) 9 - 9 2022年04月 [査読有り] [国際誌]

　概要を見る

The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that [Formula: see text] is computed for a given depth of recursion where [Formula: see text] is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.

DOI PubMed

Scopus

5

被引用数

(Scopus)
Private Evaluation of a Decision Tree Based on Secret Sharing.

Mohammad Nabil Ahmed, Kana Shimizu

ICISC 171 - 194 2022年 [査読有り]

DOI

Scopus

1

被引用数

(Scopus)
Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence.

Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu

21st International Workshop on Algorithms in Bioinformatics(WABI) 2 - 23 2021年 [査読有り]

DOI

Scopus

1

被引用数

(Scopus)
Discovery of cryoprotective activity in human genome-derived intrinsically disordered proteins

Naoki Matsuo, Natsuko Goda, Kana Shimizu, Satoshi Fukuchi, Motonori Ota, Hidekazu Hiroaki

International Journal of Molecular Sciences 19 ( 2 ) E401 2018年02月 [査読有り]

　概要を見る

Intrinsically disordered proteins (IDPs) are an emerging phenomenon. They may have a high degree of flexibility in their polypeptide chains, which lack a stable 3D structure. Although several biological functions of IDPs have been proposed, their general function is not known. The only finding related to their function is the genetically conserved YSK2 motif present in plant dehydrins. These proteins were shown to be IDPs with the YSK2 motif serving as a core region for the dehydrins’ cryoprotective activity. Here we examined the cryoprotective activity of randomly selected IDPs toward the model enzyme lactate dehydrogenase (LDH). All five IDPs that were examined were in the range of 35–45 amino acid residues in length and were equally potent at a concentration of 50 µg/mL, whereas folded proteins, the PSD-95/Dlg/ZO-1 (PDZ) domain, and lysozymes had no potency. We further examined their cryoprotective activity toward glutathione S-transferase as an example of the other enzyme, and toward enhanced green fluorescent protein as a non-enzyme protein example. We further examined the lyophilization protective activity of the peptides toward LDH, which revealed that some IDPs showed a higher activity than that of bovine serum albumin (BSA). Based on these observations, we propose that cryoprotection is a general feature of IDPs. Our findings may become a clue to various industrial applications of IDPs in the future.

DOI PubMed

Scopus

12

被引用数

(Scopus)
An Efficient Private Evaluation of a Decision Graph.

Hiroki Sudo, Koji Nuida, Kana Shimizu

Information Security and Cryptology - ICISC 2018 - 21st International Conference(ICISC) 143 - 160 2018年 [査読有り]

DOI

Scopus

7

被引用数

(Scopus)
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and A Fast Implementation in WebAssembly

Attrapadung, Nuttapong, Hanaoka, Goichiro, Mitsunari, Shigeo, Sakai, Yusuke, Shimizu, Kana, Teruya, Tadanori

Proceedings of the ACM Asia Conference on Computer and Communications Security 2018 (AsiaCCS 2018) 685 - 697 2018年 [査読有り]

DOI
Secure Wavelet Matrix: Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics

Sudo, Hiroki, Jimbo, Masanobu, Nuida, Koji, Shimizu, Kana

IEEE/ACM Transactions on Computational Biology and Bioinformatics 16 ( 5 ) 1675 - 1684 2018年 [査読有り]

DOI PubMed

Scopus

16

被引用数

(Scopus)
Secure Division Protocol and Applications to Privacy-preserving Chi-squared Tests.

Hiraku Morita, Nuttapong Attrapadung, Satsuya Ohata, Koji Nuida, Shota Yamada 0001, Kana Shimizu, Goichiro Hanaoka, Kiyoshi Asai

International Symposium on Information Theory and Its Applications(ISITA) 530 - 534 2018年 [査読有り]

DOI

Scopus

12

被引用数

(Scopus)
Differentially private Bayesian learning on distributed data

Heikkila, Mikko, Lagerspetz, Eemil, Kaski, Samuel, Shimizu, Kana, Tarkoma, Sasu, Honkela, Antti

Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 (NIPS 2017) 3226 - 3235 2017年 [査読有り]
Efficient privacy-preserving string search and an application in genomics

Kana Shimizu, Koji Nuida, Gunnar Ratsch

BIOINFORMATICS 32 ( 11 ) 1652 - 1661 2016年06月 [査読有り]

　概要を見る

Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database.
Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries.
Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within approximate to 4.6 s and approximate to 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm.

DOI PubMed

Scopus

56

被引用数

(Scopus)
Privacy-Preserving String Search for Genome Sequences with FHE bootstrapping optimization

Yu Ishimaki, Hiroki Imabavashi, Kana Shimizu, Hayato Yamana

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 3989 - 3991 2016年 [査読有り]

　概要を見る

Privacy-preserving string search is a crucial task for analyzing genomics-driven big data. In this work, we propose a cryptographic protocol that uses Fully Homomorphic Encryption (FHE) to enable a client to search on a genome sequence database without leaking his/her query to the server. Though FHE supports both addition and multiplication over encrypted data, random noise inside ciphertexts grows with every arithmetic operation especially multiplication, which results in incorrect decryption when the noise amount exceeds its threshold called level. There are two approaches to avoid the incorrect decryption: one is setting the sufficient level that assures correct decryption within the limited number of operations, and the other is resetting the noise by the method called bootstrapping. It is important to find an optimal balance between overhead caused by the level and overhead caused by the bootstrapping, since using higher level deteriorates the performance of all the arithmetic operations, while the more number of bootstrappings causes more expensive overhead. In this study, we propose an efficient approach to minimize the number of bootstrappings while reducing the level as much as possible. Our experimental result shows that it runs at most 10 times faster than a naive approach.

DOI

Scopus

16

被引用数

(Scopus)
Privacy-preserving search for chemical compound databases

Kana Shimizu, Koji Nuida, Hiromi Arai, Shigeo Mitsunari, Nuttapong Attrapadung, Michiaki Hamada, Koji Tsuda, Takatsugu Hirokawa, Jun Sakuma, Goichiro Hanaoka, Kiyoshi Asai

BMC BIOINFORMATICS 16 ( 18 ) S6 2015年12月 [査読有り]

　概要を見る

Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources.
Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation.
Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

DOI PubMed

Scopus

12

被引用数

(Scopus)
A Method for Systematic Assessment of Intrinsically Disordered Protein Regions by NMR

Natsuko Goda, Kana Shimizu, Yohta Kuwahara, Takeshi Tenno, Tamotsu Noguchi, Takahisa Ikegami, Motonori Ota, Hidekazu Hiroaki

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES 16 ( 7 ) 15743 - 15760 2015年07月 [査読有り]

　概要を見る

Intrinsically disordered proteins (IDPs) that lack stable conformations and are highly flexible have attracted the attention of biologists. Therefore, the development of a systematic method to identify polypeptide regions that are unstructured in solution is important. We have designed an indirect/reflected detection system for evaluating the physicochemical properties of IDPs using nuclear magnetic resonance (NMR). This approach employs a chimeric membrane protein-based method using the thermostable membrane protein PH0471. This protein contains two domains, a transmembrane helical region and a C-terminal OB (oligonucleotide/oligosaccharide binding)-fold domain (named NfeDC domain), connected by a flexible linker. NMR signals of the OB-fold domain of detergent-solubilized PH0471 are observed because of the flexibility of the linker region. In this study, the linker region was substituted with target IDPs. Fifty-three candidates were selected using the prediction tool POODLE and 35 expression vectors were constructed. Subsequently, we obtained N-15-labeled chimeric PH0471 proteins with 25 IDPs as linkers. The NMR spectra allowed us to classify IDPs into three categories: flexible, moderately flexible, and inflexible. The inflexible IDPs contain membrane-associating or aggregation-prone sequences. This is the first attempt to use an indirect/reflected NMR method to evaluate IDPs and can verify the predictions derived from our computational tools.

DOI PubMed

Scopus

8

被引用数

(Scopus)
On Limitations and Alternatives of Privacy-Preserving Cryptographic Protocols for Genomic Data

Tadanori Teruya, Koji Nuida, Kana Shimizu, Goichiro Hanaoka

ADVANCES IN INFORMATION AND COMPUTER SECURITY (IWSEC 2015) 9241 242 - 261 2015年 [査読有り]

　概要を見る

The human genome can identify an individual and determine the individual's biological characteristics, and hence has to be securely protected in order to prevent privacy issues. In this paper we point out, however, that current standard privacy-preserving cryptographic protocols may be insufficient to protect genome privacy. This is mainly due to typical characteristics of genome information; it is immutable, and an individual's genome has correlations to those of the individual's progeny. Then, as an alternative, we propose to protect genome privacy by cryptographic protocols with everlasting security, which provides an appropriate mixture of computational and information-theoretic security. We construct a concrete example of a protocol with everlasting security, and discuss its practical efficiency.

DOI

Scopus

1

被引用数

(Scopus)
Reference-free prediction of rearrangement breakpoint reads

Edward Wijaya, Kana Shimizu, Kiyoshi Asai, Michiaki Hamada

BIOINFORMATICS 30 ( 18 ) 2559 - 2567 2014年09月 [査読有り]

　概要を見る

Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.
Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100x, it finds similar to 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

DOI PubMed

Scopus

5

被引用数

(Scopus)
POODLE: Tools Predicting Intrinsically Disordered Regions of Amino Acid Sequence

Shimizu, K.

Methods in Molecular Biology 1137 2014年

DOI

Scopus

6

被引用数

(Scopus)
PDB-scale analysis of known and putative ligand-binding sites with structural sketches

Jun-Ichi Ito, Yasuo Tabei, Kana Shimizu, Kentaro Tomii, Koji Tsuda

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS 80 ( 3 ) 747 - 763 2012年03月 [査読有り]

　概要を見る

Computational investigation of protein functions is one of the most urgent and demanding tasks in the field of structural bioinformatics. Exhaustive pairwise comparison of known and putative ligand-binding sites, across protein families and folds, is essential in elucidating the biological functions and evolutionary relationships of proteins. Given the vast amounts of data available now, existing 3D structural comparison methods are not adequate due to their computation time complexity. In this article, we propose a new bit string representation of binding sites called structural sketches, which is obtained by random projections of triplet descriptors. It allows us to use ultra-fast all-pair similarity search methods for strings with strictly controlled error rates. Exhaustive comparison of 1.2 million known and putative binding sites finished in similar to 30 h on a single core to yield 88 million similar binding site pairs. Careful investigation of 3.5 million pairs verified by TM-align revealed several notable analogous sites across distinct protein families or folds. In particular, we succeeded in finding highly plausible functions of several pockets via strong structural analogies. These results indicate that our method is a promising tool for functional annotation of binding sites derived from structural genomics projects. Proteins 2011. (c) 2012 Wiley Periodicals, Inc.

DOI PubMed

Scopus

22

被引用数

(Scopus)
PoSSuM: a database of similar protein-ligand binding and putative pockets

Jun-Ichi Ito, Yasuo Tabei, Kana Shimizu, Koji Tsuda, Kentaro Tomii

NUCLEIC ACIDS RESEARCH 40 ( D1 ) D541 - D548 2012年01月 [査読有り]

　概要を見る

Numerous potential ligand-binding sites are available today, along with hundreds of thousands of known binding sites observed in the PDB. Exhaustive similarity search for such vastly numerous binding site pairs is useful to predict protein functions and to enable rapid screening of target proteins for drug design. Existing databases of ligand-binding sites offer databases of limited scale. For example, SitesBase covers only similar to 33 000 known binding sites. Inferring protein function and drug discovery purposes, however, demands a much more comprehensive database including known and putative-binding sites. Using a novel algorithm, we conducted a large-scale all-pairs similarity search for 1.8 million known and potential binding sites in the PDB, and discovered over 14 million similar pairs of binding sites. Here, we present the results as a relational database Pocket Similarity Search using Multiple-sketches (PoSSuM) including all the discovered pairs with annotations of various types. PoSSuM enables rapid exploration of similar binding sites among structures with different global folds as well as similar ones. Moreover, PoSSuM is useful for predicting the binding ligand for unbound structures, which provides important clues for characterizing protein structures with unclear functions. The PoSSuM database is freely available at http://possum.cbrc.jp/PoSSuM/.

DOI PubMed

Scopus

51

被引用数

(Scopus)
SlideSort: all pairs similarity search for short reads

Kana Shimizu, Koji Tsuda

BIOINFORMATICS 27 ( 4 ) 464 - 470 2011年02月 [査読有り]

　概要を見る

Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses.
Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing.

DOI PubMed

Scopus

19

被引用数

(Scopus)
SAHG, a comprehensive database of predicted structures of all human proteins

Chie Motono, Junichi Nakata, Ryotaro Koike, Kana Shimizu, Matsuyuki Shirota, Takayuki Amemiya, Kentaro Tomii, Nozomi Nagano, Naofumi Sakaya, Kiyotaka Misoo, Miwa Sato, Akinori Kidera, Hidekazu Hiroaki, Tsuyoshi Shirai, Kengo Kinoshita, Tamotsu Noguchi, Motonori Ota

NUCLEIC ACIDS RESEARCH 39 ( suppl_1 ) D487 - D493 2011年01月 [査読有り]

　概要を見る

Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith-Waterman profile-profile alignment), global-local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.

DOI PubMed

Scopus

10

被引用数

(Scopus)
POODLE-I: Disordered region prediction by integrating POODLE series and structural information predictors based on a workflow approach

Shuichi Hirose, Kana Shimizu, Tamotsu Noguchi

In Silico Biology 10 ( 3-4 ) 185 - 191 2010年 [査読有り]

　概要を見る

Under physiological conditions, many proteins that include a region lacking well-defined three-dimensional structures have been identified, especially in eukaryotes. These regions often play an important biological cellular role, although they cannot form a stable structure. Therefore, they are biologically remarkable phenomena. From an industrial perspective, they can provide useful information for determining three-dimensional structures or designing drugs. For these reasons, disordered regions have attracted a great deal of attention in recent years. Their accurate prediction is therefore anticipated to provide annotations that are useful for wide range of applications. POODLE-I (where "I" stands for integration) is a web-based disordered region prediction system. POODLE-I integrates prediction results obtained from three kinds of disordered region predictors (POODLEs) developed from the viewpoint that the characteristics of disordered regions change according to their length. Furthermore, POODLE-I combines that information with predicted structural information by application of a workflow approach. When compared with server teams that showed best performance in CASP8, POODLE-I ranked among the top and exhibited the highest performance in predicting unfolded proteins. POODLE-I is an efficient tool for detecting disordered regions in proteins solely from the amino acid sequence. The application is freely available at http://mbs.cbrc.jp/ poodle/poodle-i.html. © 2010 - IOS Press and Bioinformation Systems e.V. and the authors. All rights reserved.

DOI PubMed

Scopus

20

被引用数

(Scopus)
Interaction between Intrinsically Disordered Proteins Frequently Occurs in a Human Protein-Protein Interaction Network

Kana Shimizu, Hiroyuki Toh

JOURNAL OF MOLECULAR BIOLOGY 392 ( 5 ) 1253 - 1265 2009年10月 [査読有り]

　概要を見る

Intrinsic protein disorder is a widespread phenomenon characterised by a lack of stable three-dimensional structures and is considered to play an important role in protein-protein interactions (PPIs). This study examined the genome-wide preference of disorder in PPIs by using exhaustive disorder prediction in human PPIs. We categorised the PPIs. into three types (interaction between disordered proteins, interaction between structured proteins, and interaction between a disordered protein and a structured protein) with regard to the flexibility of molecular recognition and compared these three interaction types in an existing human PPI network with those in a randomised network. Although the structured regions were expected to become the identifiers for binding recognition, this comparative analysis revealed unexpected results. The occurrence of interactions between disordered proteins was significantly frequent, and that between a disordered protein and a structured protein was significantly infrequent. We found that this propensity was much stronger in interactions between nonhub proteins. We also analysed the interaction types from a functional standpoint by using GO, which revealed that the interaction between disordered proteins frequently occurred in cellular processes, regulation, and metabolic processes. The number of interactions, especially in metabolic processes between disordered proteins, was 1.8 times as large as that in the randomised network. Another analysis conducted by using KEGG pathways provided results where several signaling pathways and disease-related pathways included many interactions between disordered proteins. All of these analyses suggest that human PPIs preferably occur between disordered proteins and that the flexibility of the interacting protein pairs may play an important role in human PPI networks. (C) 2009 Elsevier Ltd. All rights reserved.

DOI PubMed

Scopus

61

被引用数

(Scopus)
POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix

Kana Shimizu, Shuichi Hirose, Tamotsu Noguchi

BIOINFORMATICS 23 ( 17 ) 2337 - 2338 2007年09月 [査読有り]

　概要を見る

Protein disorder is characterized by a lack of a stable 3D structure, and is considered to be involved in a number of important protein functions such as regulatory and signalling events. We developed a web application, the POODLE-S, which predicts the disordered region from amino acid sequences by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.
Availability: POODLE-S is available from http://mbs.cbrc.jp/poodle/poodle-s.htmland can be used by both academic and commercial users.
Contact: poodle@cbrc.jp.

DOI PubMed

Scopus

104

被引用数

(Scopus)
POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions

Shuichi Hirose, Kana Shimizu, Satoru Kanai, Yutaka Kuroda, Tamotsu Noguchi

BIOINFORMATICS 23 ( 16 ) 2046 - 2053 2007年08月 [査読有り]

　概要を見る

Motivation: Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications.
Results: We developed Prediction Of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthew's correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors.
Availability: POODLE-L is freely available at http://mbs.cbrc. jp/ poodle/poodle-l.html
Contact: hirose-shuichi@aist.go.jp
Supplementary information: Supplementary data are available at Bioinformatics online.

DOI PubMed

Scopus

128

被引用数

(Scopus)
Predicting mostly disordered proteins by using structure-unknown protein data

Kana Shimizu, Yoichi Muraoka, Shuichi Hirose, Kentaro Tomii, Tamotsu Noguchi

BMC BIOINFORMATICS 8 ( 1 ) 78 2007年03月 [査読有り]

　概要を見る

Background: Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences.
Results: When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred ( long)), its sensitivity was 0.834 for disordered proteins, which is 0.052 - 0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036 - 0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5% - 10% disordered sequences, 1.46% for the proteins with 10% - 20% disordered sequences and 16.57% for proteins with 20% - 40% disordered sequences.
Conclusion: The proposed method, which utilizes the information of structure- unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness.

DOI PubMed

Scopus

60

被引用数

(Scopus)
Angle: A sequencing errors resistant program for predicting protein coding regions in unfinished cDNA

Kana Shimizu, Jun Adachi, Yoichi Muraoka

Journal of Bioinformatics and Computational Biology 4 ( 3 ) 649 - 664 2006年06月 [査読有り]

　概要を見る

In the process of making full-length cDNA, predicting protein coding regions helps both in the preliminary analysis of genes and in any succeeding process. However, unfinished cDNA contains artifacts including many sequencing errors, which hinder the correct evaluation of coding sequences. Especially, predictions of short sequences are difficult because they provide little information for evaluating coding potential. In this paper, we describe ANGLE, a new program for predicting coding sequences in low quality cDNA. To achieve error-tolerant prediction, ANGLE uses a machine-learning approach, which makes better expression of coding sequence maximizing the use of limited information from input sequences. Our method utilizes not only codon usage, but also protein structure information which is difficult to be used for stochastic model-based algorithms, and optimizes limited information from a short segment when deciding coding potential, with the result that predictive accuracy does not depend on the length of an input sequence. The performance of ANGLE is compared with ESTSCAN on four dataset each of them having a different error rate (one frame-shift error or one substitution error per 200-500 nucleotides) and on one dataset which has no error. ANGLE outperforms ESTSCAN by 9.26% in average Matthews's correlation coefficient on short sequence dataset (&lt
1000 bases). On long sequence dataset, ANGLE achieves comparable performance. © 2006 Imperial College Press.

DOI PubMed

Scopus

118

被引用数

(Scopus)
Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder

K Shimizu, Y Muraoka, S Hirose, T Noguchi

PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 262 - 267 2005年 [査読有り]

　概要を見る

The prediction of intrinsic disorder from amino acid sequence has been gaining increasing attention because these have come to be known as important regions for protein functions. The most common way of predicting disorder is based on binary classification with machine learning. Since amino acid composition has different propensities in the N-term, C-term, and internal regions, the accuracy of prediction increases by dividing training data into these three regions and predicting them separately. However, previous work has lacked discussion about a concrete definition of the N-term and C-term regions, and has only used the heuristic length from the terminal. Other previous work has shown that general physicochemical properties rather than specific amino acids are important factors contributing to disorder, and a reduced amino acid alphabet can maintain excellent precision in predicting disorder. In this paper, we redefine a suitable length and position for the N-term and C-term regions for predicting disorder. Moreover, we show that each region has different physicochemical properties, which are important factors contributing to disorder. We also suggest a region-specificreduced set of amino acid and modified PSSM based on that for predicting disorder. We implemented our method and (1) compare it with the conventional division method, (2) compare our feature selection with all physicochemical features, on casp6 benchmark, PDB dataset, and DisProt. The result supports that the method of new data separation is effective, and indicates each region has different physicochemical properties that are important factors for predicting protein disorders.

DOI
A melody-retrieval system on parallelized computers

Tomonari Sonoda, Toshiya Ikenaga, Kana Shimizu, Yoichi Muraoka

IFIP Advances in Information and Communication Technology 112 265 - 272 2003年 [査読有り]

　概要を見る

This paper describes a method for a WWW-based melody-retrieval system, takes a melody sung by a user as a search clue and sent over the Internet and uses it to retrieve the song's title from a music database of standard MIDI files(SMF). It was difficult to build a melody-retrieval service with a large database and with a lot of user accesses since it was quite difficult to build a system which could achieve both quick search and high matching accuracy. We propose a method of a scalable melody-retrieval system which achieves 70% matching accuracy against more than 20, 000 pieces of music and its search time is within a few seconds. © 2003 by Springer Science+Business Media New York.

DOI
The design method of a melody retrieval system on parallel-ized computers

T Sonoda, T Ikenaga, K Shimizu, Y Muraoka

SECOND INTERNATIONAL CONFERENCE ON WEB DELIVERING OF MUSIC, PROCEEDINGS 66 - 73 2002年 [査読有り]

　概要を見る

This paper describes the design method of a WWW-based melody-retrieval system which takes a sung melody as a search clue and retrieves the music title from a music database of standard MIDI files(SMF) over the Internet. The most important thing in building a melody-retrieval system on the Internet is to achieve both high matching accuracy and quick search. It was., however, quite difficult to simultaneously fulfill these two conditions since it took long time, for the matching process. We propose the design of a. system which consists of parallel-ized melody-retrieval servers for building a high performance service on the Internet.

DOI

Scopus

5

被引用数

(Scopus)

▼全件表示

書籍等出版物

「個人ゲノム情報の活用とプライバシー保護」, プレシジョン・メディシン～ビッグデータの構築・分析から臨床応用・課題まで～

清水佳奈( 担当：分担執筆)

エヌ・ティー・エス 2018年 ISBN: 9784860435806
"POODLE: tools predicting intrinsically disordered regions of amino acid sequence", Protein Structure Prediction 3rd Edition

Kana Shimizu( 担当：分担執筆)

Springer 2014年 ISBN: 9781493903665

講演・口頭発表等

Privacy-aware computational genomics

Kana Shimizu [招待有り]

SPIEZ Convergence 2018 (Spiez)

発表年月： 2018年09月
Privacy-preserving genome sequence search

Kana Shimizu [招待有り]

2016 International Workshop on Spatial and Temporal Modeling from Statistical, Machine Learning and Engineering perspectives (STM2016) (Tokyo)

発表年月： 2016年07月
Efficient Privacy-Preserving String Search and an Application in Genomics

Kana Shimizu

High Throughput Sequencing Algorithms & Applications (HitSeq 2015), A SIG of ISMB/ECCB 2015 (Dublin)

発表年月： 2015年07月
Privacy Preserving Similarity Search in Biomedical Data by Homomorphic encryption

Kana Shimizu

Biological Data Science (Cold Spring Harbor)

発表年月： 2014年11月
Next generation sequencing data analyses by using ultra-fast all pairs similarity search

Kana Shimizu [招待有り]

International Symposium on Single Biomolecule Analysis 2013 (Kyoto)

発表年月： 2013年11月
Privacy-preserving search for a chemical compound database

Kana Shimizu [招待有り]

ISMB/ECCB 2013 Oral Poster Presentations Track (Berlin)

発表年月： 2013年07月
Privacy-preserving search for a chemical compound database

Kana Shimizu

ISCB-Asia/SCCG 2012 (Shenzhen)

発表年月： 2012年12月
SlideSort: Fast and exact algorithm for Next Generation Sequencing data analysis

Kana Shimizu

ISMB/ECCB 2011 Highlights Track (Boston)

発表年月： 2011年07月

▼全件表示

共同研究・競争的資金等の研究課題

精密で安全な参照ゲノムの研究

日本学術振興会科学研究費助成事業

研究期間:

2023年06月

-

2026年03月

清水佳奈
圧縮秘匿計算による大規模データ処理

日本学術振興会科学研究費助成事業基盤研究(S)

研究期間:

2021年07月

-

2026年03月

定兼邦彦, 坂本比呂志, 清水佳奈, 渋谷哲朗, 申吉浩

　概要を見る

データを秘匿したまま計算を行うアルゴリズムの最も基本的なものとして，ソートのアルゴリズムについて研究を行った．既存手法では，基数ソートに基づくものが提案されているが，オンライン通信量が基数 2^L に比例して増えるものであった．時間計算量は L に反比例して減るが，オンライン通信量は L に関して指数的に増えるため，L は 2 か 3 程度の値しか使用できない．本研究では，ソートアルゴリズムを実行する秘匿計算アルゴリズムで，オンライン通信量が小さいものを提案した．w ビットの数 n 個をソートする際のオンライン通信量は O(wn(1+log n/L)) であり，L に反比例して減るため，大きな L でも動作する．
また，ゲノムシーケンス中の可変長パタンの検索を行うための索引の開発を行った．この手法の特徴は，オンライン計算での時間および通信計算量と通信ラウンド数がデータベースのサイズ n に依存しないことである．ヒトゲノムの一部である長さ1000万のシーケンスからの，長さ100のパタンの検索に対しては，既存手法よりも10倍以上高速になった．
カーネル法は機械学習において主要な手法のひとつであり，カーネル法を秘匿計算の枠組みで利用するためには，複数のユーザが秘密に保有するデータを入力としたカーネル関数の計算を秘密を漏らさずに実行するステップを避けて通ることができない．本研究では，カーネル関数のなかで「積分型」ともよばれる広い応用をもつカーネル関数を効率的に計算するための共通の原理が存在することを発見した．
完全準同型暗号(TFHE)のための高機能ライブラリを開発した．TFHEではこれまではビット演算のみしか行うことができなかったが，これを拡張し，整数型を扱えるようにし，さらに四則/剰余演算，比較演算を可能とした．
圧縮秘匿計算による大規模データ処理

日本学術振興会科学研究費助成事業

研究期間:

2021年04月

-

2025年03月

定兼邦彦, 坂本比呂志, 清水佳奈, 渋谷哲朗, 申吉浩
プライバシ保護ゲノム情報解析技術の開発

日本学術振興会科学研究費助成事業

研究期間:

2019年04月

-

2022年03月

清水佳奈

　概要を見る

近年，爆発的に増加している個人ゲノムデータの取り扱いには高いプライバシのリスクが付随するため，データを安全かつ，効果的に集約し，有用な知見を発見する方法論の開発が強く望まれている．このような背景から本研究では，ゲノム情報のどの部分が個人のプライバシに該当するのかを明らかにしたうえで，秘匿すべき部分を暗号化したまま情報解析を行う方法論の研究を行う．本研究では特に，ゲノム配列検索とゲノムワイド関連解析の２点を中心的な課題と定め，大規模なデータ解析を安全に実施できる手法の開発を行う．近年，爆発的に増加している個人ゲノムデータの取り扱いには高いプライバシのリスクが付随するため，有用なデータが様々な組織に囲い込まれて孤立するサイロ化と呼ばれる現象が多発している．統計や機械学習を用いてゲノム情報を解析する際には，データの種類が豊富でサンプル数が多いほど正確な結果を得ることができるため，サイロ化したデータを安全かつ，効果的に集約し，有用な知見を発見する方法論の開発が強く望まれている．このような背景から本研究では，ゲノム情報を秘匿したまま情報解析を行う方法論の研究を行うことを目的とする．本研究では特に，(1)ゲノム配列検索と(2)ゲノムワイド関連解析の２点を中心的な課題と定め，大規模なデータ解析を安全に実施できる手法の開発を目指す．2019年度は，(1)については，秘密分散法による全文検索の暗号プロトコルを考案し，そのプロトタイプを実装した．プロトタイプを用いた実験では，長さ一千万のゲノムデータベースへの検索が実際のインターネット環境でも10秒程度となることを確認した．(2)については，Trusted Execution Environmentを実現する技術の一つであるIntel SGXを用いて，ゲノムワイド関連解析（GWAS）を行うことのできる情報分析プラットフォームを考案し，そのプロタイプ実装を行った．2019年度は，本研究で目的とする（１）ゲノム配列検索の秘匿化と（２）ゲノムワイド関連解析の秘匿化に関して，次の進捗があった．（１）秘密分散法にもとづき，ゲノム配列や医療文書の分析に役立つ秘匿全文検索の暗号プロトコルを考案し，そのプロトタイプを実装した．開発した手法は，事前計算の実施の工夫により，クエリの投入から検索結果を得るまでのオンライン計算に必要な時間計算量，通信量，ラウンド回数がデータベース長に依存せず，クエリ長のみに依存する．一般的な情報検索では，クエリ長はデータベース長と比較して非常に小さいため，ゲノムデータベースのような膨大な情報に対しても非常に高速に動作する．プロトタイプによる実験では，長さ一千万のゲノムデータベースへの検索が実際のインターネット環境でも10秒程度となることを確認した．（２）Trusted Execution Environmentを実現する技術の一つであるIntel SGXを用いて個人ゲノムデータを解析するシステムも開発した．開発したシステムでは，全ゲノム相関解析やデータのクラスタリングを行うことができる他，データのアクセスパターンを秘匿するOblivious RAMを用いる事により，巨大なデータにも高速にアクセスすることができる．データ分析は，ユーザーがJavaScript等のプログラミング言語により記述し，サーバー上のEnclave内に配備した仮想マシンがサーバー側に情報を漏らすことなく実行できる．200人以上のゲノム変異データを用いた実験では，情報保護をしないソフトウェアと同等の時間で解析を行えることを確認した． 上記のように，大規模なゲノムデータ解析の実現に重要な要素技術について，基礎的な方法論の考案からプロトタイプ実装までを達成しており，当初の計画通り進展している．現在までのところ，おおむね順調に進展しているため，2020年度も引き続き当初の計画に従って研究を進めていく．ゲノム配列検索については，秘密分散の通信部分も含めた効率的な実装を目指すほか，秘匿全文検索アルゴリズムのさらなる高度化と効率化を検討する．ゲノムワイド関連解析については，TEEによる情報分析システムの出力プライバシの保護を検討する等，さらなる高度化を検討する
グラフを用いた精密な癌ゲノム配列解析法の研究

栢森情報科学振興財団栢森情報科学振興財団研究助成

研究期間:

2021年

-

2022年
医療情報解析を促進するプライバシ保護技術の開発

公益財団法人大川情報通信基金公益財団法人大川情報通信基金 2017年度(第31回)研究助成

研究期間:

2018年03月

-

2019年03月
個別化医療を実現するプライバシ保護ゲノム情報解析

科学技術振興機構/日本医療研究開発機構戦略的国際科学技術協力推進事業（SICP）日-フィンランド（Tekes/AF）研究交流

研究期間:

2014年05月

-

2017年03月
類似ゲノムの差異を逃さないDe novoゲノム解析技術の開発

日本学術振興会科研費・挑戦的萌芽研究

研究期間:

2014年04月

-

2017年03月

清水佳奈

　概要を見る

近年の研究により，ゲノム配列は非常に多様であることが示唆された．しかし，現在主流となっている情報解析の手法では，シークエンサーから出力された断片配列をまずはじめに参照ゲノムに対して貼り付けて，その結果から統計情報を得る方策がとられているため，得られる解析結果は参照ゲノムの特徴に左右されて，ゲノムの多様性を見落としてしまう問題点があった．そこで本研究では，複数のデータセットを直接的に比較して，データセット間で異なる特徴を持つゲノム配列のパターンを発見する手法の設計及び実装を行った
プライバシー保護バイオインフォマティクス基盤技術の開発と応用

日本学術振興会科学研究費助成事業

研究期間:

2013年04月

-

2016年03月

浜田道昭, 清水佳奈, 花岡悟一郎, 津田宏治, フリスマーティン, 浅井潔

　概要を見る

個人のゲノム情報や薬のたねとなる化合物情報などは，機密情報として取り扱うことが必要となる．一方，オープンサイエンスの立場からは，これらの情報を積極的に利用して他の情報と合わせてデータマイニングを行うことが重要である．本研究では，これらの生物分野の重要情報を秘匿したまま様々なデータマイニングを行う方法論の開発を行った．具体的には，化合物データベースの秘匿検索，隠れマルコフモデルを用いたゲノム情報の秘匿検索，秘匿配列アラインメントの技術を開発した
基質結合部位予測に向けたタンパク質局所構造の高速比較法の開発

研究期間:

2011年04月

-

2014年03月

　概要を見る

本研究では、タンパク質基質結合部位の粗視化と高速なソートアルゴリズムの適用により、大量の結合部位の比較を可能とする新たな手法を開発した。タンパク質立体構造データベース全体の規模の既知及び潜在的基質結合部位に対して開発手法を適用し、その比較結果を収載したデータベースPoSSuMを構築と公開を行った。現在PoSSuMは、550万の既知及び潜在的基質結合部位を比較して得られた4,900万の類似結合部位ペアを収載するまでに成長している。また、ドラッグリポジショニング、副作用予測などへの応用を目指し、ChEMBLのアッセイ情報へのリンクを付加した新たなデータベースPoSSuMdsの構築と公開を行った
ギガシークエンスデータの高速解析技術の開発

日本学術振興会科研費・若手研究（B）

研究期間:

2010年04月

-

2012年03月

清水佳奈

　概要を見る

ギガシークエンサーは,短い断片配列(リード)を大量に出力するため,高速な解析技術の開発が急務となっている.本研究では,オフセット付き鳩ノ巣原理を応用し,大量のリードから超高速に類似配列を発見するアルゴリズムSlideSortを開発した. SlideSortは従来手法と比較して,同程度のメモリで1000倍以上の速度向上を達成した.考案したアルゴリズムの応用例として,最小全域木を構築するソフトウェアの開発も行った.類似ペア検索の応用範囲は広く,上記に述べたクラスタリングの他にも,共通パターンの発見,アセンブリの効率化などに役立つと期待される

▼全件表示

Misc

アルゴリズムの頂が持つ力はいかに?-ゲノコン2021開催記録-

清水佳奈, 坂本一憲, 坂本一憲, 笠原雅弘

情報処理 63 ( 7 ) 2022年

J-GLOBAL
秘密分散に基づく秘匿全文検索

中川, 佳貴, 大畑, 幸矢, 清水, 佳奈

コンピュータセキュリティシンポジウム2019論文集 2019 289 - 296 2019年10月

　概要を見る

秘密分散法に基づき，クエリと文書を互いに秘匿したまま全文検索する手法を提案する．計算には，クエリ保有者，文書保有者，結託の恐れのない二人の委託計算者が参加する．クエリ保有者と文書保有者のそれぞれが事前にクエリと文書のシェアを作って委託計算者に送り，委託計算者は検索結果のみをクエリ保有者に返す．提案手法では，FM-indexと呼ばれる全文索引により文書を検索しやすい構造に変換する．これにより，委託計算で必要となる計算量，通信量，通信ラウンド数の全てが文書の長さに依存せず，クエリの長さのみに比例する．提案手法を実装し，文書とクエリの長さがそれぞれ1000万と100のゲノム配列検索を行ったところ，委託計算者の計算時間は1CPUの利用でわずか0.30秒，ラウンド数は202，通信量は7.13KBであった．文書保有者の事前計算は文書長とクエリ長の積に線形だが，並列化により効率的に計算できるため，実装上の工夫をせずとも10CPUの利用で15分以内に留まる．提案手法は文字列検索の他に，オートマトンによるパターンマッチや木構造の走査等にも応用でき，多様なデータ検索の高速化に貢献することが期待される．
We propose a secure full-text search protocol based on secret sharing. We consider a case in which a database holder and a querier send shares of the database and the query to two non-colluding computing nodes and they return the result to the querier. By using an efficient full-text index called FM-index, all the time, communication and round complexities of the protocol become linear to the length of query and do not depend on the length of database. In the experiment using a genomic sequence of length 10 million and a query sequence of length 100, the CPU time, communication round, and data transfer size for a single query were only 0.3s, 202 and 7.13KB. The building block of the protocol can also be applied to other applications such as a deterministic finite automaton matching and a tree traversal, and is expected to contribute to various real word problems.

CiNii
5分で分かる ! ? 有名論文ナナメ読み：Ferragina, P. and Manzini, G. : Opportunistic Data Structures with Applications

清水, 佳奈

情報処理 60 ( 6 ) 554 - 556 2019年05月

　概要を見る

ゲノム配列検索への応用で有名なP. Ferraginaと G. Manziniによる全文索引法を概説する．

CiNii
生体情報セキュリティ

清水佳奈

生体医工学 Annual57 ( Abstract ) S131_2 - S131_2 2019年

　概要を見る

多種多様な生体情報を収集・分析し，医療や健康産業に役立てることが期待されているが，生体情報の多くが個人の健康状態や生活サイクルと強く関連するため，データの流通や共有を行う際には，適切なプライバシ保護の仕組みが必要となる．本講演では，生体情報を扱うシステムが個人のプライバシと関連のあるデータを扱う際に備えるべき機能とその実現方法について論じる．特に，データの中身を秘匿したまま情報解析を行うプライバシ保護データマイニングと呼ばれる技術について紹介し，生体情報を用いたシステムへの応用に関して議論する．

DOI CiNii
タンパク質分子の柔軟性を考慮した新規ドッキングゲーム

飯野, 翼, 大上, 雅史, 秋山, 泰, 清水, 佳奈

第80回全国大会講演論文集 2018 ( 1 ) 931 - 932 2018年03月

　概要を見る

タンパク質の複合体構造を予測するドッキングシミュレーションは創薬において重要な役割を果たす.タンパク質は複合体を形成する際に立体構造の一部が変化することが知られているが,ドッキングシミュレーションの際にそのような分子の柔軟性を考慮すると,候補構造の探索空間が膨大になる問題があった.そこで本研究ではゲーミフィケーションにより,探索の効率化を実現する方法を提案する.具体的には,生物物理学の知識を持たないユーザーであっても,直感的に分子表面の側鎖を動かしてより良いドッキング状態を形成可能な仕組みを備えたゲームソフトを実装し,多数のプレイヤーを競わせることで高い精度で複合体構造を予測する.本研究の利用により,計算機が自動で探索を行う旧来の手法と比較して,高精度の予測を短時間で達成できることが期待できる.

CiNii
高速に類似文字列ペアを発見するビット並列フィルター

山田, 太樹, 清水, 佳奈

第80回全国大会講演論文集 2018 ( 1 ) 355 - 356 2018年03月

　概要を見る

類似する文字列ペアの発見は様々な分野において重要な課題である.本稿では,長さがそれぞれn, m(ただし,n ≥ mとする)の文字列が与えられた場合に,編集距離が閾値θ以内であるか否かを判定する手法を提案する.編集距離そのものを高速に計算する従来研究では,Myersによるワード長wのときO(nm/w)を達成するビット並列アルゴリズムが知られている.これに対して我々は,鳩ノ巣原理に基づく効果的なフィルターFを構成し,Fが受理したペアに対してのみ編集距離を計算する手法を提案する.Fはビット並列によりO(θn/w)で計算可能なため,類似性の低いペアが多く含まれるデータセットに対して非常に高速に動作する.10,000本のゲノム配列から類似ペアを全列挙したところ,Myersの手法と比較して17倍程度,ビット並列を用いずに編集距離を計算した場合と比較して100倍程度高速であった.

CiNii
Trend Review〈続〉改正個人情報保護法でゲノム研究はどう変わるか?

山本奈津子, 川嶋実苗, 清水佳奈, 片山俊明, 荻島創一

実験医学 36 ( 13 ) 2018年

J-GLOBAL
ペアリングベースの効率的なレベル2準同型暗号

Nuttapong Attrapadung, 花岡悟一郎, 光成滋生, 坂井祐介, 清水佳奈, 照屋唯紀

暗号と情報セキュリティシンポジウム2018 (SCIS 2018) 1A2 ( 4 ) 1 - 8 2018年01月

研究発表ペーパー・要旨（全国大会，その他学術会議）

　概要を見る

Demonstration is https://herumi.github.io/she-wasm/she-demo.html .
クライアント補助型秘匿計算および基本ツール

森田啓, 大畑幸矢, Nuttapong Attrapadung, 縫田光司, 山田翔太, 清水佳奈, 花岡悟一郎, 浅井潔

2018年暗号と情報セキュリティシンポジウム(SCIS2018)予稿集 2018年01月

研究発表ペーパー・要旨（全国大会，その他学術会議）
改正個人情報保護法でゲノム研究はどう変わるか?-個人識別符号・要配慮情報としてのゲノムデータ

清水佳奈, 山本奈津子, 川嶋実苗, 片山俊明, 荻島創一

実験医学 35 ( 4 ) 2017年

J-GLOBAL
生命情報科学におけるプライバシ保護検索(<小特集>「エネルギーシミュレーションとデータ解析」について)

清水佳奈

シミュレーション 35 ( 1 ) 26 - 31 2016年03月

CiNii
完全準同型暗号を用いた高速なゲノム秘匿検索

石巻優, 清水佳奈, 縫田光司, 山名早人

2016年暗号と情報セキュリティシンポジウム(SCIS2016)予稿集 2016年01月

研究発表ペーパー・要旨（全国大会，その他学術会議）
Privacy-Preserving Search for Chemical Compound DatabasesPrivacy-Preserving Search for Chemical Compound Databases

Kana Shimizu, Koji Nuida, Hiromi Arai, Shigeo Mitsunari, Nuttapong Attrapadung, Michiaki Hamada, Koji Tsuda, Takatsugu Hirokawa, Jun Sakuma, Goichiro Hanaoka, Kiyoshi Asai

bioRxiv ( 013995 ) 2015年01月

機関テクニカルレポート，技術報告書，プレプリント等

DOI
ゲノムプライバシ保護を考慮した紛失通信プロトコル

照屋唯紀, 縫田光司, 清水佳奈, 花岡悟一郎

2015年暗号と情報セキュリティシンポジウム(SCIS2015)予稿集 2015年01月

研究発表ペーパー・要旨（全国大会，その他学術会議）
範囲指定型問い合わせに対する効率的なデータベース秘匿検索プロトコル

照屋唯紀, Nuttapong Attrapadung, 稲村勝樹, 松田隆宏, 中川紗菜美, 縫田光司, 花岡悟一郎, 清水佳奈

コンピュータセキュリティシンポジウム 2014 (CSS 2014), デモセッション (ポスターセッション) ( DPS-02 ) 2014年10月

その他

　概要を見る

Awarded 1 / 8 demonstrations. 
 
To make our demonstration, we used a library https://github.com/aistcrypt/Lifted-ElGamal (currently, newer version of this library is in https://github.com/herumi/mcl)
双方向の情報を秘匿可能な効率的化合物データベース検索プロトコル

縫田光司, 照屋唯紀, 花岡悟一郎, 清水佳奈, 松田隆宏, 矢内直人, 中川紗奈美

コンピュータセキュリティシンポジウム 2013 (CSS 2013), デモンストレーション (ポスター) セッション ( DPS-07 ) 2013年10月

その他

　概要を見る

Awarded 1 / 9 demonstrations. 
 
To make our demonstration, we used a library https://github.com/aistcrypt/Lifted-ElGamal (currently, newer version of this library is in https://github.com/herumi/mcl)
トモサガ: スマホで安心「共ダチ」探し

照屋唯紀, 縫田光司, 花岡悟一郎, 清水佳奈, 松田隆宏, 矢内直人, 中川紗奈美

コンピュータセキュリティシンポジウム 2013 (CSS 2013), デモンストレーション (ポスター) セッション 2013年10月

その他

　概要を見る

To make our demonstration, we used a library https://github.com/aistcrypt/Lifted-ElGamal (currently, newer version of this library is in https://github.com/herumi/mcl)
1SBP-05 類似配列の高速な全ペア列挙に基づくNGSデータの解析手法(1SBP 進化する1分子シークエンサー,シンポジウム,日本生物物理学会第51回年会(2013年度))

清水佳奈

生物物理 53 ( supplement1-2 ) S87 2013年

DOI CiNii
加法準同型暗号を用いた化合物データベースの秘匿検索プロトコル

縫田光司, 清水佳奈, 荒井ひろみ, 浜田道昭, 津田宏治, 広川貴次, 花岡悟一郎, 佐久間淳, 浅井潔

コンピュータセキュリティシンポジウム2012(CSS2012)予稿集 2012 ( 3 ) 2012年10月

研究発表ペーパー・要旨（全国大会，その他学術会議）

J-GLOBAL
検索行動におけるプライバシ保護—Privacy preservation in information retrieval—人工知能学会全国大会(第26回)文化,科学技術と未来 ; オーガナイズドセッション「OS-20 プライバシー保護データマイニング」

荒井ひろみ, 清水佳奈, 浜田道昭

人工知能学会全国大会論文集 26 1 - 4 2012年

CiNii
検索行動におけるプライバシ保護

荒井ひろみ, 清水佳奈, 浜田道昭, 津田宏治, 広川貴次, 佐久間淳, 浅井潔, 浅井潔

人工知能学会全国大会論文集(CD-ROM) 26th ROMBUNNO.3I2-OS-20-1 - 3I2OS201 2012年

　概要を見る

実社会におけるネットワーク構造データにおいて，リンクが各ノードの個人情報である場合のリンク予測におけるプライバシ保護に関し問題提起と解決法の提案を行う．商取引ネットワークにおける取引相手の推薦からの個人情報漏えいなどを，各ノードが受け取る予測リンクから他のノードのリンク情報が推定されるリスクとして議論する．また情報漏えいリスクを大幅に低減しつつ，リンク予測の精度を担保するかく乱法を提案する．

DOI CiNii J-GLOBAL
PRESAT‐vectorを用いた天然変性タンパク質配列の網羅的検証系の確立

合田名都子, 清水佳奈, 桑原陽太, 天野剛志, 池上貴久, 太田元規, 廣明秀一

日本蛋白質科学会年会プログラム・要旨集 10th 68 2010年05月

J-GLOBAL
1P246 構造解析手法と構造状態におけるorder-disorder領域の比較(生命情報科学(構造ゲノミクス),ポスター発表,第45回日本生物物理学会年会)

横田恭宣, 廣瀬修一, 清水佳奈, 野口保

生物物理 47 ( supplement ) S85 2007年

DOI CiNii
ブースティング法を応用したcDNA 配列のコーディング領域予測

清水, 佳奈, 村岡, 洋一, 足立, 淳

第63回全国大会講演論文集 2001 ( 1 ) 133 - 134 2001年09月

CiNii

▼全件表示

現在担当している科目

情報理工学実験Ａ　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
生命情報解析技術

基幹理工学部

2026年秋学期
情報理工学実験Ｂ【前年度成績S評価者用】

基幹理工学部

2026年春学期
卒業論文Ｂ

基幹理工学部

2026年秋学期
情報理工学実験Ｂ

基幹理工学部

2026年春学期
アルゴリズムとデータ構造Ｂ　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
アルゴリズムとデータ構造Ｂ

基幹理工学部

2026年秋学期
生命情報処理とICT

基幹理工学部

2026年春学期
プロジェクト研究Ｂ

基幹理工学部

2026年秋学期
プロジェクト研究Ａ

基幹理工学部

2026年春学期
卒業論文Ｂ（春学期）

基幹理工学部

2026年春学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2026年春学期
卒業論文Ａ　（集中）

基幹理工学部

2026年集中（春・秋学期）
卒業論文Ａ

基幹理工学部

2026年春学期
卒業論文Ａ（秋学期）

基幹理工学部

2026年秋学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2026年秋学期
卒業論文Ｂ　18前再

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再

基幹理工学部

2026年春学期
情報理工学実験Ａ

基幹理工学部

2026年秋学期
卒業論文Ｂ

基幹理工学部

2026年秋学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再

基幹理工学部

2026年春学期
プロジェクト研究Ｂ

基幹理工学部

2026年秋学期
生命情報処理とICT

基幹理工学部

2026年春学期
アルゴリズムとデータ構造Ｂ

基幹理工学部

2026年秋学期
情報通信実験Ｂ【前年度成績S評価者用】

基幹理工学部

2026年春学期
情報通信実験Ｂ

基幹理工学部

2026年春学期
卒業論文Ａ　（集中）

基幹理工学部

2026年集中（春・秋学期）
卒業論文Ａ（秋学期）

基幹理工学部

2026年秋学期
卒業論文Ａ

基幹理工学部

2026年春学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
情報通信実験Ａ　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
情報通信実験Ａ

基幹理工学部

2026年秋学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2026年春学期
Graduation Thesis B (Fall) [S Grade]

基幹理工学部

2026年秋学期
Graduation Thesis B (Fall)

基幹理工学部

2026年秋学期
Computer Science and Communications Engineering Laboratory A

基幹理工学部

2026年秋学期
Graduation Thesis B (Spring)

基幹理工学部

2026年春学期
Computer Science and Communications Engineering Laboratory A [S Grade]

基幹理工学部

2026年秋学期
プロジェクト研究Ａ

基幹理工学部

2026年春学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年春学期
卒業論文Ｂ　18前再

基幹理工学部

2026年秋学期
卒業論文Ｂ（春学期）

基幹理工学部

2026年春学期
Project Research Fall

基幹理工学部

2026年秋学期
Project Research Spring

基幹理工学部

2026年春学期
Introduction to Computers and Networks

基幹理工学部

2026年春学期
Computer Science and Communications Engineering Laboratory B

基幹理工学部

2026年春学期
Graduation Thesis A　(Fall)[S Grade]【For students enrolled before 2022】

基幹理工学部

2026年秋学期
Graduation Thesis A (Spring) [S Grade]

基幹理工学部

2026年春学期
Graduation Thesis A (Fall)

基幹理工学部

2026年秋学期
Graduation Thesis A (Fall) [S Grade]

基幹理工学部

2026年秋学期
Graduation Thesis A (Spring)

基幹理工学部

2026年春学期
Graduation Thesis A　(Spring)[S Grade]【For students enrolled before 2022】

基幹理工学部

2026年春学期
Graduation Thesis A　(Spring)【For students enrolled before 2022】

基幹理工学部

2026年春学期
Graduation Thesis A　(Fall)【For students enrolled before 2022】

基幹理工学部

2026年秋学期
Graduation Thesis B (Spring) [S Grade]

基幹理工学部

2026年春学期
Master's Thesis (Department of Computer Science and Communications Engineering)

大学院基幹理工学研究科

2026年通年
修士論文（情報・通信）

大学院基幹理工学研究科

2026年通年
バイオインフォマティクス演習D

大学院基幹理工学研究科

2026年秋学期
リサーチプロジェクト（秋）

大学院基幹理工学研究科

2026年秋学期
リサーチプロジェクト（春）

大学院基幹理工学研究科

2026年春学期
生命情報解析技術

大学院基幹理工学研究科

2026年秋学期
情報理工・情報通信特別実験B

大学院基幹理工学研究科

2026年秋学期
情報理工・情報通信特別実験A

大学院基幹理工学研究科

2026年春学期
生命情報解析研究

大学院基幹理工学研究科

2026年通年
バイオインフォマティクス研究

大学院基幹理工学研究科

2026年通年
生命情報解析研究

大学院基幹理工学研究科

2026年通年
バイオインフォマティクス研究

大学院基幹理工学研究科

2026年通年
Seminar on Hands-on Course on Computational Biology A

大学院基幹理工学研究科

2026年春学期
Seminar on Bioinformatics D

大学院基幹理工学研究科

2026年秋学期
Seminar on Bioinformatics C

大学院基幹理工学研究科

2026年春学期
Seminar on Bioinformatics B

大学院基幹理工学研究科

2026年秋学期
Seminar on Bioinformatics A

大学院基幹理工学研究科

2026年春学期
Algorithms in Computational Biology

大学院基幹理工学研究科

2026年秋学期
Seminar on Hands-on Course on Computational Biology D

大学院基幹理工学研究科

2026年秋学期
Seminar on Hands-on Course on Computational Biology C

大学院基幹理工学研究科

2026年春学期
Seminar on Hands-on Course on Computational Biology B

大学院基幹理工学研究科

2026年秋学期
Special Laboratory B in Computer Science and Communications Engineering

大学院基幹理工学研究科

2026年秋学期
Special Laboratory A in Computer Science and Communications Engineering

大学院基幹理工学研究科

2026年春学期
Research on Computational Biology

大学院基幹理工学研究科

2026年通年
Research on Bioinformatics

大学院基幹理工学研究科

2026年通年
バイオインフォマティクス演習C

大学院基幹理工学研究科

2026年春学期
生命情報解析演習D

大学院基幹理工学研究科

2026年秋学期
バイオインフォマティクス演習Ｂ

大学院基幹理工学研究科

2026年秋学期
生命情報解析演習C

大学院基幹理工学研究科

2026年春学期
生命情報解析演習B

大学院基幹理工学研究科

2026年秋学期
バイオインフォマティクス演習A

大学院基幹理工学研究科

2026年春学期
生命情報解析演習A

大学院基幹理工学研究科

2026年春学期
情報理工・情報通信特別演習Ｂ

大学院基幹理工学研究科

2026年秋学期
情報理工・情報通信特別演習Ａ

大学院基幹理工学研究科

2026年春学期

▼全件表示

他学部・他研究科等兼任情報

理工学術院大学院基幹理工学研究科

学内研究所・附属機関兼任歴

2024年

-

2026年

理工学術院総合研究所兼任研究員

特定課題制度（学内資金）

プライバシ保護ゲノム情報解析技術の開発

2018年

　概要を見る

ゲノム情報処理を保護する際に必要な技術が備えるべき機能と性能について詳細な検討をした．また、決定グラフの秘匿計算プロトコルの開発を行ったほか、乗算が一度のみ可能な準同型暗号の応用方法について検討を重ね、ゲノム情報検索アプリケーションを実装した。
プライバシ保護ゲノム情報解析技術の開発

2017年

　概要を見る

ゲノム情報を含むデータベースを検索する際に必要となるプライバシ保護技術の開発を行った．本研究では，準同型暗号を用いてユーザーとデータベースが双方の情報を開示しないまま目的とするデータ解析を行う暗号プロトコルの開発を行った．具体的には，ロジスティック回帰によりゲノムワイド関連解析を行うプロトコル，学習済みの決定木によるクラス分類を行うプロトコルを開発した．
暗号技術を用いたプライバシ保護ゲノム情報解析技術の開発

2016年

　概要を見る

データベース検索においてクエリとデータベースの双方にプライバシ情報を含む場合，双方のプライバシを同時に守ることは難しい．本研究では，このような問題を解決するため，データの中身を隠したまま，検索結果のみをユーザーに提示することのできる秘匿検索技術の開発を行った．提案手法は部分文字列の秘匿検索を行うことができるが，文字の種類数が多い場合にも高速に動作する性質を持ち，従来手法と比較して10～100倍以上高速であった．本研究成果はデータベース検索の安全化に役立つことが期待される．
ｃＤＮＡにおける遺伝子領域の特定に関する研究

2003年

　概要を見る

ポストシークエンス時代の到来と共に、ゲノム情報解析の必要性が高まっている。ゲノムの情報は冗長であり、シークエンサーで解読された情報の中のごく一部だけが生物の機能に携わっている。そのためゲノム情報を、創薬、遺伝子治療、品種改良等に役立てるには、まず最初に大量のデータの中から遺伝子領域を特定し、タンパク質の機能解析をする必要がある。本研究では以上のような背景を踏まえ、cDNA配列からタンパク質のコーディング領域を予測することを目標とした。cDNAからタンパク質のコーディング領域を特定する従来研究は、コドン連鎖などのコドンの使用頻度をもとに予測を行っている。そのためコドンの使用頻度に偏りがある配列に対しては、予測精度を保てない欠点がある。ゲノムの情報は例外が多く、コドンの使用頻度が偏った配列は数多く存在する。ロバストな予測を行うためには多くの生物学的知見による情報を利用する必要があるが、多くの従来研究では、隠れマルコフモデルなどの確率モデルを使った手法がとられているため、確率的に依存関係にある生物学的知見を同時に利用することが困難であった。これに対し、本研究ではコドンの使用頻度のほかにも有用と思われる生物学的知見を数多く組み合わせて予測することのできる手法を提案した。提案した手法を実装し、ベンチマーク用データを用いて評価を行った結果、従来研究よりも良い精度を得ることができた。また、本研究で実装したシステムはwebから実行することも可能であり、近日中に一web上で公開する予定である。なお、本研究の成果はcDNAだけでなくDNAのexon領域予測にも応用できる。現在はDNA予測に向けてシステムの改変を行い、本研究がより広範囲に貢献できるよう、研究を進めている。