Updated on 2022/10/01

写真a

 
YAMANA, Hayato
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor
Mail Address
メールアドレス
Profile

Hayato Yamana received a Dr. Eng. degree at Waseda University in 1993. He began his career at the Electrotechnical Laboratory of the former Ministry of International Trade and Industry (MITI), and was seconded to MITI’s Machinery and Information Industries Bureau for a year in 1996. He was subsequently appointed associate professor of Computer Science and Engineering at Waseda University in 2000, and has been a professor since 2005. Since 2010, he has been director of DBSJ (Database Society of Japan). He was director of IPSJ (Information Processing Society of Japan) and vice chair of the Institute of Electronics, Information and Communication Engineers (IEICE)’s Information and Communication Society. At Waseda University, he was Deputy Chief Information Officer from 2015 to 2020. Since Oct. 2020, he has been Vice President for IT Promotion and Chief Information Officer. His research interests include fully homomorphic encryption, big data analysis and computer architecture.

Concurrent Post

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

  • Affiliated organization   Global Education Center

Research Institute

  • 2021
    -
    2022

    データ科学センター   兼任センター員

  • 2020
    -
    2022

    理工学術院総合研究所   兼任研究員

  • 2020
    -
    2022

    リサーチイノベ オープンイノベーション推進セクション   兼任センター員

  • 2020
    -
    2021

    大学総合研究センター   兼任センター員

Education

  • 1989.04
    -
    1993.03

    Waseda University   Graduate School of Science and Engineering  

  • 1987.04
    -
    1989.03

    Waseda University   Graduate School of Science and Engineering  

  • 1983.04
    -
    1987.03

    Waseda University   School of Science and Engineering   Electoronics and Communication  

Degree

  • 1993.03   Waseda University   Dr.(Eng.)

  • Waseda University   MS(Eng.)

Research Experience

  • 2005.04
    -
    Now

    Waseda University   Faculty of Science and Engineering   Professor

  • 2020.10
    -
    Now

    Waseda University   Vice President for IT Promotion

  • 2005.04
    -
     

    National Institute of Infomatics   Visiting Professor

  • 2004.04
    -
    2005.03

    National Institute of Informatics   Visiting Associate Professor

  • 2000.04
    -
    2005.03

    Waseda University   School of Science and Engineering   Associate Professor

  • 1999.04
    -
    2000.03

    Seikei Univ.   Visiting lecturer

  • 1997.04
    -
    2000.03

    Electrotechnical Laboratory、METI   Chief Researcher

  • 1993.04
    -
    1997.03

    Electrotechnical Laboratory   Researcher

  • 1989.11
    -
    1993.03

    Waseda University   Research Assistant

▼display all

Professional Memberships

  •  
     
     

    ACM

  •  
     
     

    IEEE

  •  
     
     

    IEICE

  •  
     
     

    IPSJ

  •  
     
     

    DBSJ

 

Research Areas

  • Web informatics and service informatics

  • Life, health and medical informatics

  • Information network

  • Kansei informatics

  • Database

  • Intelligent informatics

  • Human interface and interaction

  • Computer system

  • Computational science   Homomorphic Encryption

▼display all

Research Interests

  • Pen-based Computing

  • Internet Information Analysis

  • Author Identification

  • Credibility of Web information

  • High Performance Homomorphic Encryption

  • Data Mining

  • SNS Analysis

  • WWW Crawler

  • Search Engines

  • Parallelizing Compiler

▼display all

Papers

  • Comparing Augmented Reality-based Display Methods to Present Guiding Information.

    Riko Horikawa, Manaka Ito, Kosuke Komiya, Tatsuo Nakajima, Hayato Yamana

    LifeTech     22 - 25  2022

    DOI

  • Topological Measurement of Deep Neural Networks Using Persistent Homology.

    Satoru Watanabe, Hayato Yamana

    Ann. Math. Artif. Intell.   90 ( 1 ) 75 - 92  2022

    DOI

  • Point of Interest Recommendation Acceleration Using Clustering

    Huida Jiao, Fan Mo, Hayato Yamana

    2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021     175 - 180  2021.03

     View Summary

    Point of Interest (POI) recommendation systems exploit information in location-based social networks to predict locations that users may be interested in. POI recommendations have been widely adopted in many applications, which are helpful for daily life. POI recommendation services receive a huge volume of visit history data generated by users' daily lives with mobile devices. However, POI recommendation systems require long time to build a model from such a huge volume of check-in data and recommend suitable POIs to users. Thus, it is indispensable to shorten the execution time in a big data era. In this study, we propose a clustering-based method to divide the data into multiple subsets to accelerate the POI recommendation's execution while maintaining accuracy. Our proposed method can be adapted to any general POI recommendation algorithm. We divide the whole data, that is, users and POIs, into subsets with a tree structure to balance the size of subsets according to both geographical information and user check-in distribution. Evaluation results show that we successfully accelerate the base algorithms over 17 to 39 times faster while keeping the accuracy almost the same.

    DOI

  • Construction of Differentially Private Summaries over Fully Homomorphic Encryption.

    Shojiro Ushiyama, T. Takahashi, M. Kudo, Hayato Yamana

    CoRR   abs/2112.08662  2021

  • Segmentation-based Phishing URL Detection.

    Eint Sandi Aung, Hayato Yamana

    Wi/iat     550 - 556  2021

    DOI

  • Improving Text Classification Using Knowledge in Labels

    Cheng Zhang, Hayato Yamana

    2021 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2021)     193 - 197  2021

     View Summary

    Various algorithms and models have been proposed to address text classification tasks; however, they rarely consider incorporating the additional knowledge hidden in class labels. We argue that hidden information in class labels leads to better classification accuracy. In this study, instead of encoding the labels into numerical values, we incorporated the knowledge in the labels into the original model without changing the model architecture. We combined the output of an original classification model with the relatedness calculated based on the embeddings of a sequence and a keyword set. A keyword set is a word set to represent knowledge in the labels. Usually, it is generated from the classes while it could also be customized by the users. The experimental results show that our proposed method achieved statistically significant improvements in text classification tasks. The source code and experimental details of this study can be found on Github(1).

    DOI

  • Topological Measurement of Deep Neural Networks Using Persistent Homology.

    Satoru Watanabe, Hayato Yamana

    CoRR   abs/2106.03016  2021

     View Summary

    The inner representation of deep neural networks (DNNs) is indecipherable,
    which makes it difficult to tune DNN models, control their training process,
    and interpret their outputs. In this paper, we propose a novel approach to
    investigate the inner representation of DNNs through topological data analysis
    (TDA). Persistent homology (PH), one of the outstanding methods in TDA, was
    employed for investigating the complexities of trained DNNs. We constructed
    clique complexes on trained DNNs and calculated the one-dimensional PH of DNNs.
    The PH reveals the combinational effects of multiple neurons in DNNs at
    different resolutions, which is difficult to be captured without using PH.
    Evaluations were conducted using fully connected networks (FCNs) and networks
    combining FCNs and convolutional neural networks (CNNs) trained on the MNIST
    and CIFAR-10 data sets. Evaluation results demonstrate that the PH of DNNs
    reflects both the excess of neurons and problem difficulty, making PH one of
    the prominent methods for investigating the inner representation of DNNs.

  • User-centric Distributed Route Planning in Smart Cities based on Multi-objective Optimization.

    Francis Tiausas, Jose Paolo Talusan, Yu Ishimaki, Hayato Yamana, Hirozumi Yamaguchi, Shameek Bhattacharjee, Abhishek Dubey, Keiichi Yasumoto, Sajal K. Das 0001

    IEEE International Conference on Smart Computing(SMARTCOMP)     77 - 82  2021

     View Summary

    The realization of edge-based cyber-physical systems (CPS) poses important challenges in terms of performance, robustness, security, etc. This paper examines a novel approach to providing a user-centric adaptive route planning service over a network of Road Side Units (RSUs) in smart cities. The key idea is to adaptively select routing task parameters such as privacy-cloaked area sizes and number of retained intersections to balance processing time, privacy protection level, and route accuracy for privacy-augmented distributed route search while also handling per-query user preferences. This is formulated as an optimization problem with a set of parameters giving the best result for a set of queries given system constraints. Processing Throughput, Privacy Protection, and Travel Time Accuracy were developed as the objective functions to be balanced. A Multi-Objective Genetic Algorithm based technique (NSGA-II) is applied to recover a feasible solution. The performance of this approach was then evaluated using traffic data from Osaka, Japan. Results show good performance of the approach in balancing the aforementioned objectives based on user preferences.

    DOI

  • Real-time Periodic Advertisement Recommendation Optimization under Delivery Constraint using Quantum-inspired Computer.

    Fan Mo, Huida Jiao, Shun Morisawa, Makoto Nakamura, Koichi Kimura, Hisanori Fujisawa, Masafumi Ohtsuka, Hayato Yamana

    Proceedings of the 23rd International Conference on Enterprise Information Systems     431 - 441  2021

    DOI

  • Overfitting Measurement of Deep Neural Networks Using No Data.

    Satoru Watanabe, Hayato Yamana

    8th IEEE International Conference on Data Science and Advanced Analytics(DSAA)     1 - 10  2021

    DOI

  • Construction of Differentially Private Summaries Over Fully Homomorphic Encryption.

    Shojiro Ushiyama, Tsubasa Takahashi 0001, Masashi Kudo, Hayato Yamana

    Database and Expert Systems Applications - 32nd International Conference   12924 LNCS   9 - 21  2021

     View Summary

    Cloud computing has garnered attention as a platform of query processing systems. However, data privacy leakage is a critical problem. Chowdhury et al. proposed Cryptε, which executes differential privacy (DP) over encrypted data on two non-colluding semi-honest servers. Further, the DP index proposed by these authors summarizes a dataset to prevent information leakage while improving the performance. However, two problems persist: 1) the original data are decrypted to apply sorting via a garbled circuit, and 2) the added noise becomes large because the sorted data are partitioned with equal width, regardless of the data distribution. To solve these problems, we propose a new method called DP-summary that summarizes a dataset into differentially private data over a homomorphic encryption without decryption, thereby enhancing data security. Furthermore, our scheme adopts Li et al.’s data-aware and workload-aware (DAWA) algorithm for the encrypted data, thereby minimizing the noise caused by DP and reducing the errors of query responses. An experimental evaluation using torus fully homomorphic encryption (TFHE), a bit-wise fully homomorphic encryption library, confirms the applicability of the proposed method, which summarized eight 16-bit data in 12.5 h. We also confirmed that there was no accuracy degradation even after adopting TFHE along with the DAWA algorithm.

    DOI

  • First-Impression-Based Unreliable Web Pages Detection - Does First Impression Work?

    Kenta Yamada, Hayato Yamana

    Advanced Information Networking and Applications - Proceedings of the 35th International Conference on Advanced Information Networking and Applications (AINA-2021)   227   635 - 641  2021

     View Summary

    Considering the continuous increase in the number of web pages worldwide, detecting unreliable pages, such as those containing fake news, is indispensable. Natural language processing and social-information-based methods have been proposed for web page credibility evaluation. However, the applicability of the former to web pages is limited because a model is required for each language, while the latter is poorly adapted to changes, owing to its dependence on external services that can be discontinued. To solve these problems, herein we propose a first-impression-based web credibility evaluation method. Our experimental evaluation of a fake news corpus gave an accuracy of 0.898, which is superior to those of existing methods.

    DOI

  • Fast and Accurate Function Evaluation with LUT over Integer-Based Fully Homomorphic Encryption.

    Ruixiao Li, Hayato Yamana

    Advanced Information Networking and Applications - Proceedings of the 35th International Conference on Advanced Information Networking and Applications (AINA-2021)   226 LNNS   620 - 633  2021

     View Summary

    Fully homomorphic encryption (FHE), which is used to evaluate arbitrary functions in addition and multiplication operations via modular arithmetic (mod q) over ciphertext, can be applied in various privacy-preserving applications. However, big data is difficult to adopt owing to its high computational cost and the challenges associated with the efficient handling of complex functions such as log(x). To address these problems, we propose a method for handling any multi-input function using a lookup table (LUT) to replace the original calculations with array indexing operations over integer-based FHE. In this study, we extend our LUT-based method to handle any input values, i.e., including non-matched element values in the LUT, to match with a near indexed value and return an approximated output over FHE. In addition, we propose a technique for splitting the table to handle large integers for improved accuracy with only a slight increase in the execution time. For the experiments, we use the Microsoft/SEAL library, and the results show that our proposed method can evaluate a 16-bit to 16-bit function in 2.110 s and a 16-bit to 32-bit function in 2.268 s, thereby outperforming previous methods implemented via bit-wise calculation over FHE.

    DOI

  • Faster Homomorphic Trace-Type Function Evaluation.

    Yu Ishimaki, Hayato Yamana

    IEEE Access   9   53061 - 53077  2021

     View Summary

    Homomorphic encryption enables computations over encrypted data without decryption, and can be used for outsourcing computations to some untrusted source. In homomorphic encryption based on the hardness of ring-learning with errors, offering promising security and functionality, a plaintext is represented by a polynomial. A plaintext is treated as a vector whose homomorphic evaluation enables component-wise addition and multiplication, as well as rotation across the components. We focus on a commonly used and time-consuming subroutine that enables homomorphically summing-up the components of the vector or homomorphically extracting the coefficients of the polynomial, and call it homomorphic trace-type function. We improve the efficiency of the homomorphic trace-type function evaluation. The homomorphic trace-type function evaluation is performed by repeating homomorphic rotation followed by addition (rotations-and-sums). To correctly add up a rotated ciphertext and an unrotated one, a special operation called key-switching should be performed on the rotated one. As key-switching is computationally expensive, the rotations-and-sums is inherently inefficient. We propose a more efficient trace-type function evaluation by using loop-unrolling, which is compatible with other optimization techniques such as hoisting, and can exploit multi-threading. We show that the rotations-and-sums is not the optimal solution in terms of runtime complexity and that a trade-off exists between time and space. Experimental results demonstrate that our proposed method works 1.32-2.12 times faster than the previous method.

    DOI

  • Time Distribution Based Diversified Point of Interest Recommendation

    Fan Mo, Huida Jiao, Hayato Yamana

    2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2020     37 - 44  2020.04  [Refereed]

     View Summary

    © 2020 IEEE. In location-based social networks (LBSNs), personalized point-of-interest (POI) recommendation helps users mine their interests and find new locations conveniently and quickly. It is one of the most important services to improve users' quality of life and travel. Most POI recommendation systems devoted to improve accuracy, however in recent years, diversity of POI recommendations, such as categorical and geographical diversity, receives much attention because a single type of POIs easily causes loss of users' interest. Different from previous diversity related recommendations, in this paper, we focus on visiting time of POI- A unique attribute of the interaction between users and POIs. Users usually have different active visiting time patterns and different frequently visiting POIs depending on time. If a set of proper visiting times of recommended POIs concentrates on a small range of time, the user might be unsatisfied because they cannot cover whole of the user's active time range that results in inappropriateness for the user to visit those POIs. To solve this problem, we propose a new concept-time diversity and a time distribution based recommendation method to improve time diversity of recommended POIs. Our experimental result with Gowalla dataset shows our proposed method effectively improves time diversity 25.9% compared with USG with only 7.9% accuracy loss.

    DOI

  • 推薦システムにおける推薦理由提示手法の提案-機械学習解釈モデルを用いて-

    森澤竣, 真鍋智紀, 座間味卓臣, 山名早人

    日本データベース学会和文論文誌(Web)   18-J  2020

    J-GLOBAL

  • 完全準同型暗号におけるbootstrap problem及びrelinearize problemの厳密解法の高速化

    佐藤宏樹, 石巻優, 山名早人

    日本データベース学会和文論文誌(Web)   18-J  2020

    J-GLOBAL

  • Towards Privacy-preserving Anomaly-based Attack Detection against Data Falsification in Smart Grid.

    Yu Ishimaki, Shameek Bhattacharjee, Hayato Yamana, Sajal K. Das 0001

    2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids(SmartGridComm)     1 - 6  2020

     View Summary

    In this paper, we present a novel framework for privacy-preserving anomaly-based data falsification attack detection in a smart grid advanced metering infrastructure (AMI). Specifically, we propose an anomaly detection framework over homomorphically encrypted data. Unlike existing privacy-preserving anomaly detectors, our framework detects the presence of not only energy theft (i.e., deductive attack), but also more advanced data integrity attacks (i.e., additive and camouflage attacks) over encrypted data without diminishing detection sensitivity. We optimize the anomaly detection procedure such that potentially expensive operations over homomorphically encrypted space are avoided. Moreover, we optimize the encryption method designed for a resource constrained device such as smart meters, and the time to complete encryption gets 40x faster over the naïve adoption of the encryption method. We also validate the proposed framework using a real dataset from smart metering infrastructures, and demonstrate that the data integrity attacks can be detected with high sensitivity, without sacrificing user privacy. Experimental results with a real dataset of 200 houses from an AMI in Texas showed that the detection sensitivity of the plaintext algorithm is not degraded due to the use of homomorphic encryption.

    DOI

  • Real-Time Periodic Advertisement Recommendation Optimization using Ising Machine.

    Fan Mo, Huida Jiao, Shun Morisawa, Makoto Nakamura, Koichi Kimura, Hisanori Fujisawa, Masafumi Ohtsuka, Hayato Yamana

    2020 IEEE International Conference on Big Data (IEEE BigData 2020)     5783 - 5785  2020

     View Summary

    Online advertising is widely used by commercial companies to attract customers. Tuning advertisement delivery to achieve a high conversion rate (CVR) is crucial for improving advertising effectiveness. Because advertisers require demandside platforms (DSPs) to deliver a certain number of ads within a fixed period, it is challenging to maximize CVR while satisfying ads delivery constraints. Such a combinatorial optimization problem is NP-hard when we have a considerable number of both ads and users. In this paper, we adopt Digital Annealer (DA), a quantum-inspired Ising computer, to solve the combinatorial optimization problem. The experimental evaluation result shows that the proposed method increases accuracy from 0.176 to 0.326 and achieves 20.8 times speed-up compared to baseline.

    DOI

  • Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption.

    Takumi Ishiyama, Takuya Suzuki, Hayato Yamana

    CoRR   abs/2009.03727   3989 - 3995  2020

     View Summary

    In the big data era, cloud-based machine learning as a service (MLaaS) has attracted considerable attention. However, when handling sensitive data, such as financial and medical data, a privacy issue emerges, because the cloud server can access clients' raw data. A common method of handling sensitive data in the cloud uses homomorphic encryption, which allows computation over encrypted data without decryption. Previous research adopted a low-degree polynomial mapping function, such as the square function, for data classification. However, this technique results in low classification accuracy. This study seeks to improve the classification accuracy for inference processing in a convolutional neural network (CNN) while using homomorphic encryption. We apply various orders of the polynomial approximations of Google's Swish and ReLU activation functions. We also adopt batch normalization to normalize the inputs for the approximated activation functions to fit the input range to minimize the error. We implemented CNN inference labeling over homomorphic encryption using the Microsoft's Simple Encrypted Arithmetic Library (SEAL) for the Cheon-Kim-Kim-Song (CKKS) scheme. The experimental evaluations confirmed classification accuracies of 99.29% and 81.06% for MNIST and CIFAR-10, respectively, which entails 0.11% and 4.69% improvements, respectively, over previous methods.

    DOI

  • Deep Neural Network Pruning Using Persistent Homology.

    Satoru Watanabe, Hayato Yamana

    3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering(AIKE)     153 - 156  2020

     View Summary

    Deep neural networks (DNNs) have improved the performance of artificial intelligence systems in various fields including image analysis, speech recognition, and text classification. However, the consumption of enormous computation resources prevents DNNs from operating on small computers such as edge sensors and handheld devices. Network pruning (NP), which removes parameters from trained DNNs, is one of the prominent methods of reducing the resource consumption of DNNs. In this paper, we propose a novel method of NP, hereafter referred to as PHPM, using persistent homology (PH). PH investigates the inner representation of knowledge in DNNs, and PHPM utilizes the investigation in NP to improve the efficiency of pruning. PHPM prunes DNNs in ascending order of magnitudes of the combinational effects among neurons, which are calculated using the one-dimensional PH, to prevent the deterioration of the accuracy. We compared PHPM with global magnitude pruning method (GMP), which is one of the common baselines to evaluate pruning methods. Evaluation results show that the classification accuracy of DNNs pruned by PHPM outperforms that pruned by GMP.

    DOI

  • Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption.

    Takumi Ishiyama, Takuya Suzuki, Hayato Yamana

    CoRR   abs/2009.03727   3989 - 3995  2020

    DOI

  • DAMCREM: Dynamic Allocation Method of Computation REsource to Macro-Tasks for Fully Homomorphic Encryption Applications.

    Takuya Suzuki, Yu Ishimaki, Hayato Yamana

    IEEE International Conference on Smart Computing(SMARTCOMP)     458 - 463  2020

     View Summary

    Smart computing aims to improve the quality of life by utilizing Internet-of-Things devices and cloud computing. Typically, this computing handles private and/or personal information so concealing such sensitive information is a challenge. Adopting fully homomorphic encryption (FHE) is one approach for handling such sensitive information safely; that is, we can calculate the encrypted data without decryption. However, the time and space complexity of the FHE operation is high. Thus, its computation takes a long time. In this study, we aim to shorten FHE execution time by adopting our new scheduling algorithm, which divides a task into several macro-tasks and then assigns a set of threads. We assume a cloud computing system that is equipped with a many-core CPU. Thus, we propose the dynamic allocation method of computation resource to macro-tasks (DAMCREM), which dynamically allocates a certain number of threads (selected from pre-defined candidates) to each macro-task of every given job. In the evaluation, we compared DAMCREM to naive methods that allocate a pre-defined number of threads to each macro-task. The result shows that the average latency and maximum latency of job execution is less than those of naive methods, even when the average interval of job arrival is short.

    DOI

  • WUY at SemEval-2020 Task 7: Combining BERT and Naive Bayes-SVM for Humor Assessment in Edited News Headlines.

    Cheng Zhang, Hayato Yamana

    Proceedings of the Fourteenth Workshop on Semantic Evaluation(SemEval@COLING)     1071 - 1076  2020

    DOI

  • Topological Measurement of Deep Neural Networks Using Persistent Homology.

    Satoru Watanabe, Hayato Yamana

    International Symposium on Artificial Intelligence and Mathematics(ISAIM)    2020

     View Summary

    The inner representation of deep neural networks (DNNs) is indecipherable, which makes it difficult to tune DNN models, control their training process, and interpret their outputs. In this paper, we propose a novel approach to investigate the inner representation of DNNs through topological data analysis (TDA). Persistent homology (PH), one of the outstanding methods in TDA, was employed for investigating the complexities of trained DNNs. We constructed clique complexes on trained DNNs and calculated the one-dimensional PH of DNNs. The PH reveals the combinational effects of multiple neurons in DNNs at different resolutions, which is difficult to be captured without using PH. Evaluations were conducted using fully connected networks (FCNs) and networks combining FCNs and convolutional neural networks (CNNs) trained on the MNIST and CIFAR-10 data sets. Evaluation results demonstrate that the PH of DNNs reflects both the excess of neurons and problem difficulty, making PH one of the prominent methods for investigating the inner representation of DNNs.

    DOI

  • Imitation-Resistant Passive Authentication Interface for Stroke-Based Touch Screen Devices.

    Masashi Kudo, Hayato Yamana

    HCI International 2020 - Posters - 22nd International Conference   1226 CCIS   558 - 565  2020

     View Summary

    Today’s widespread use of stroke-based touchscreen devices creates numerous associated security concerns and requires efficient security measures in response. We propose an imitation-resistant passive authentication interface for stroke-based touch screen devices employing classifiers for each individual stroke, which is evaluated with respect to 26 features. For experimental validation, we collect stroke-based touchscreen data from 23 participants containing target and imitation stroke patterns using a photo-matching game in the form of an iOS application. The equal error rate (EER), depicting the rate at which false rejection and false acceptance of target and imitator strokes are equal, is assumed as an indicator of the classification accuracy. Leave-one-out cross-validation was employed to evaluate the datasets based on the mean EER. For each cross-validation, one out of the two target datasets, an imitator dataset, and the remaining 20 imitator datasets were selected as genuine data, imitator test data, and imitator training data, respectively. Our results confirm stroke imitation as a serious threat. Among the 26 stroke features evaluated in terms of their imitation tolerance, the stroke velocity was identified as the most difficult to imitate. Dividing classifiers based on the stroke direction was found to further contribute to classification accuracy.

    DOI

  • Smart SE: Smart Systems and Services Innovative Professional Education Program.

    Hironori Washizaki, Kenji Tei, Kazunori Ueda, Hayato Yamana, Yoshiaki Fukazawa, Shinichi Honiden, Shoichi Okazaki, Nobukazu Yoshioka, Naoshi Uchihira

    44th IEEE Annual Computers, Software, and Applications Conference(COMPSAC)     1113 - 1114  2020

     View Summary

    The Smart Systems and Services Innovative Professional Education (Smart SE) program is a certification program developed as part of the education network for the Practical information Technologies (enPiT-Pro) project, which is funded by the Japan Ministry of Education, Culture, Sports, Science and Technology. The Smart SE program provides industry professionals working in fields related to information and communication technology (ICT) with additional training and education in smart systems and services that utilize various technologies such as IoT, Cloud, Big Data, and Artificial Intelligence (AI) for businesses. Here, we illustrate its purpose, curriculum and features to respond to the needs of industrial professional education.

    DOI

  • Privacy Preserving Calculation in Cloud using Fully Homomorphic Encryption with Table Lookup

    Ruixiao Li, Yu Ishimaki, Hayato Yamana

    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020)     315 - 322  2020  [Refereed]

     View Summary

    To protect data in cloud servers, fully homomorphic encryption (FHE) is an effective solution. In addition to encrypting data, FHE allows a third party to evaluate arithmetic circuits (i.e., computations) over encrypted data without decrypting it, guaranteeing protection even during the calculation. However, FHE supports only addition and multiplication. Functions that cannot be directly represented by additions or multiplications cannot be evaluated with FHE. A naive implementation of such arithmetic operations with FHE is a bit-wise operation that encrypts numerical data as a binary string. This incurs huge computation time and storage costs, however. To overcome this limitation, we propose an efficient protocol to evaluate multi-input functions with FHE using a lookup table. We extend our previous work, which evaluates a single-integer input function, such as f(x). Our extended protocol can handle multi-input functions, such as f(x, y). Thus, we propose a new method of constructing lookup tables that can evaluate multi-input functions to handle general functions. We adopt integer encoding rather than bit-wise encoding to speed up the evaluations. By adopting both permutation operations and a private information retrieval scheme, we guarantee that no information from the underlying plaintext is leaked between two parties: a cloud computation server and a decryptor. Our experimental results show that the runtime of our protocol for a two-input function is approximately 13 minutes, when there are 8,192 input elements in the lookup table. By adopting a multi-threading technique, the runtime can be further reduced to approximately three minutes with eight threads. Our work is more practical than a previously proposed bit-wise implementation, which requires 60 minutes to evaluate a single-input function.

    DOI

  • Geographic Diversification of Recommended POIs in Frequently Visited Areas.

    Jungkyu Han, Hayato Yamana

    ACM Transactions on Information Systems   38 ( 1 ) 1 - 39  2020  [Refereed]

     View Summary

    In the personalized Point-Of-Interest (POI) (or venue) recommendation, the diversity of recommended POIs is an important aspect. Diversity is especially important when POIs are recommended in the target users' frequently visited areas, because users are likely to revisit such areas. In addition to the (POI) category diversity that is a popular diversification objective in recommendation domains, diversification of recommended POI locations is an interesting subject itself. Despite its importance, existing POI recommender studies generally focus on and evaluate prediction accuracy. In this article, geographical diversification (geo-diversification), a novel diversification concept that aims to increase recommendation coverage for a target users' geographic areas of interest, is introduced, from which a method that improves geo-diversity as an addition to existing state-of-the-art POI recommenders is proposed. In experiments with the datasets from two real Location Based Social Networks (LSBNs), we first analyze the performance of four state-of-the-art POI recommenders from various evaluation perspectives including category diversity and geo-diversity that have not been examined previously. The proposed method consistently improves geo-diversity (CPR(geo)@20) by 5 to 12% when combined with four state-of-the-art POI recommenders with negligible prediction accuracy (Recall@20) loss and provides 6 to 18% geo-diversity improvement with tolerable prediction accuracy loss (up to 2.4%).

    DOI

  • Appearance Frequency-Based Ranking Method for Improving Recommendation Diversity

    Seiki Miyamoto, Takumi Zamami, Hayato Yamana

    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019)     420 - 425  2019  [Refereed]

     View Summary

    Recommender systems are used to analyze users' preferences through their past activities and to personalize recommendations for each user based on what they might be interested in. The performance of the recommender system is most commonly measured using only recommendation accuracy. However, recommending accurate items does not mean that the generated recommendation is the best for the user because it can be biased towards some items that have a higher chance of being liked by users, such as popular items. Recommendations become repetitive and obvious with biased item selection and are less likely to be personalized. To mitigate bias and repetitiveness, recommendation diversity has been studied. However, diversity has a trade-off relationship with accuracy. Modifying the recommendation algorithm to consider diversity while learning about user preferences would not only cause loss in accuracy, but also lead to less precise reading of user preferences. Instead, using ranking methods to re-rank the priority of items predicted, the recommendation algorithm would keep the preciseness of the algorithm. In this study, a ranking method using the appearance frequency of items to restrict the items from being frequently recommended will be proposed. The experimental results showed that the proposed method consistently improved diversity in multiple diversity metrics.

    DOI

  • Privacy-preserving Recommendation for Location-based Services

    Qiuyi Lyu, Yu Ishimaki, Hayato Yamana

    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019)     98 - 105  2019  [Refereed]

     View Summary

    Location-based recommendation services, such as Foursquare, enhance the convenience in the life of consumers. Users are usually sensitive to disclose their personal information. Unavoidable security concerns arise because malicious third parties could misuse confidential information, such as the users' preferences. The mainstream research to this problem is employing the privacy-preserving k-NN search algorithm. However, two major bottlenecks exist. One is that it only provides the nearest points of interest (POI) to the users without any recommendations based on the users' behavior history. This limited service eventually results in a situation in which no user would prefer to continue using it. The other is that only a single user holds the private key; thus, the service providers cannot obtain any user's information to analyze to make a profit. To solve the first problem, our proposed protocol provides recommendation services by adopting collaborative filtering techniques with an encrypted database based on fully homomorphic encryption aside from encrypting both the user's location and preferences. For the second problem, a privacy service provider (PSP) is designed to generate and hold the private key. Thus, service providers can homomorphically compute aggregate information concerning user behavior patterns and send the encrypted results to PSP to ensure decryption while maintaining the privacy of individual users. Compared with the previous studies, the novelty of the proposed protocol is the design of a commercially valuable privacy recommendation mechanism that could benefit both consumers and service providers on LBS.

    DOI

  • Fully Homomorphic Encryption with Table Lookup for Privacy-Preserving Smart Grid.

    Ruixiao Li, Yu Ishimaki, Hayato Yamana

    IEEE International Conference on Smart Computing(SMARTCOMP)     19 - 24  2019  [Refereed]

     View Summary

    Smart grids are indispensable applications in smart connected communities (SCC). To construct privacy-preserving anomaly detection systems on a smart grid, we adopt fully homomorphic encryption (FHE) to protect users' sensitive data. Although FHE allows a third party to perform calculations on encrypted data without decryption, FHE only supports addition and multiplication on encrypted data. In anomaly detection, we must calculate both harmonic and arithmetic means consisting of logarithms. A naive implementation of such arithmetic operations with FHE is a bitwise operation; thus, it requires huge computation time. To speed up such calculations, we propose an efficient protocol to evaluate any functions with FHE using a lookup table (LUT). Our protocol allows integer encoding, i.e., a set of integers is encrypted as a single ciphertext, rather than using bitwise encoding. Our experimental results in a multi-threaded environment show that the runtime of our protocol is approximately 51 s when the size of the LUT is 448,000. Our protocol is more practical than the previously proposed bitwise implementation.

    DOI

  • URL-based Phishing Detection using the Entropy of Non-Alphanumeric Characters.

    Eint Sandi Aung, Hayato Yamana

    Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services(iiWAS)     385 - 392  2019  [Refereed]

     View Summary

    © 2019 Association for Computing Machinery. Phishing is a type of personal information theft in which phishers lure users to steal sensitive information. Phishing detection mechanisms using various techniques have been developed. Our hypothesis is that phishers create fake websites with as little information as possible in a webpage, which makes it difficult for content- A nd visual similarity-based detections by analyzing the webpage content. To overcome this, we focus on the use of Uniform Resource Locators (URLs) to detect phishing. Since previous work extracts specific special-character features, we assume that non- A lphanumeric (NAN) character distributions highly impact the performance of URL-based detection. We hence propose a new feature called the entropy of NAN characters for URL-based phishing detection. Experimental evaluation with balanced and imbalanced datasets shows 96% ROC AUC on the balanced dataset and 89% ROC AUC on the imbalanced dataset, which increases the ROC AUC as 5 to 6% from without adopting our proposed feature.

    DOI

  • A Privacy-Preserving Query System using Fully Homomorphic Encryption with Real-World Implementation for Medicine-Side Effect Search.

    Yusheng Jiang, Tamotsu Noguchi, Nobuyuki Kanno, Yoshiko Yasumura, Takuya Suzuki, Yu Ishimaki, Hayato Yamana

    Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services(iiWAS)     63 - 72  2019  [Refereed]

     View Summary

    © 2019 Association for Computing Machinery. The preservation of privacy during a search has become a serious problem in recent years. There is an increasing requirement to ensure that user queries are not abused by a third party, including the search provider. Fully homomorphic encryption (FHE) can conduct addition and multiplication directly over ciphertext. Using FHE, privacy, concerning both the user queries and the database of the search provider, can be protected. In this paper, we propose a privacy-preserving query system model. We implemented the proposed model in a real-world medicine side-effect query system. We applied a filtering technique, prior to the query deployment, to reduce the size of the database and used multi-threading to accelerate the search. The system was tested 10,000 times with a random query, using a database comprising 40,000 records of simulation data, and completed 99.84% of the queries within 60 seconds (s), proving the real-world applicability of our system.

    DOI

  • Outsourced Private Set Union on Multi-Attribute Datasets for Search Protocol using Fully Homomorphic Encryption.

    Rumi Shakya, Yoshiko Yasumura, Suzuki Takuya, Yu Ishimaki, Hayato Yamana

    ACM International Conference Proceeding Series     55 - 62  2019  [Refereed]

     View Summary

    © 2019 Association for Computing Machinery. In the era of big data and cloud computing, outsourcing data storage to the cloud poses the risk of its abuse or leakage. Thus, we address the problem of delegating computation on outsourced private datasets while maintaining privacy. In this study, we consider a scenario involving two data owners outsourcing their datasets to a cloud service. The cloud performs a set union computation, after which the querier sends a query to obtain information from both datasets. We propose a protocol that uses fully homomorphic encryption (FHE) and Cartesian-join of Bloom filters (CBF) as proposed by Wang et al. The protocol obtains information on the existence of a particular set of elements without learning about the residing source. To the best of our knowledge, our protocol, by using the FHE and CBF matrix, is a novel approach to ensuring the security of outsourced set union operations.

    DOI

  • Secure Naïve Bayes Classification Protocol over Encrypted Data Using Fully Homomorphic Encryption.

    Yoshiko Yasumura, Yu Ishimaki, Hayato Yamana

    Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services(iiWAS)     45 - 54  2019  [Refereed]

     View Summary

    © 2019 Association for Computing Machinery. Machine learning classification has a wide range of applications. In the big data era, a client may want to outsource classification tasks to reduce the computational burden at the client. Meanwhile, an entity may want to provide a classification model and classification services to such clients. However, applications such as medical diagnosis require sensitive data that both parties may not want to reveal. Fully homomorphic encryption (FHE) enables secure computation over encrypted data without decryption. By applying FHE, classification can be outsourced to a cloud without revealing any data. However, existing studies on classification over FHE do not achieve the scenario of outsourcing classification to a cloud while preserving the privacy of the classification model, client's data and result. In this work, we apply FHE to a naïve Bayes classifier and, to the best of our knowledge, propose the first concrete secure classification protocol that satisfies the above scenario.

    DOI

  • Point of Interest Recommendation by Exploiting Geographical Weighted Center and Categorical Preference.

    Fan Mo, Hayato Yamana

    2019 International Conference on Data Mining Workshops   2019-November   73 - 76  2019  [Refereed]

     View Summary

    © 2019 IEEE. Point of interest (POI) recommendation is one of the indispensable services in location-based social networks (LBSNs). POI recommendation helps users find new locations and better understand the city. In LBSNs, the aspects, such as geographical information and categorical information, improve the accuracy of POI recommendation. In this paper, we propose two new techniques to improve the recommendation accuracy; 1) weighted center of a target user's each active area and 2) category-dependent threshold for categorical preference. The weighted center represents density-based center of a target user's active area. The geographical aspect usually adopts the target user's active areas that he frequently visited. Although previous researches define the active area by its active center and its radius, they choose the location of the most frequently visited POI as the active center even if there exist several POIs that have similar number of check-ins, which results in miss-definition of active center. Our weighted center is able to handle the target user's check-in probability, which follows a power-law distribution. Besides, previous researches predict users' preference for categories; however, they neglect the fact that different categories have different users' preference distributions. For example, a specific category has wide-range of subcategories to be preferred by user, but another category has a few subcategories to be preferred, even if there are many subcategories in the category. Thus, we set different thresholds to select candidate subcategories in each category. Experimental result with Weeplaces dataset shows that our method outperforms other baselines by at least 16.93% in F1-score@5.

    DOI

  • Two-Factor Authentication Using Leap Motion and Numeric Keypad.

    Tomoki Manabe, Hayato Yamana

    HCI for Cybersecurity, Privacy and Trust - First International Conference   11594 LNCS   38 - 51  2019  [Refereed]

     View Summary

    © 2019, Springer Nature Switzerland AG. Biometric authentication has become popular in modern society. It takes less time and effort for users when compared to conventional password authentication. Furthermore, biometric authentication was considered more secure than password authentication because it was more difficult to steal biometric information when compared to passwords. However, given the development of high-spec cameras and image recognition technology, the risk of the theft of biometric information, such as fingerprints, is increasing. Additionally, biometric authentication exhibits lower and less stable accuracy than that of password authentication. To solve the aforementioned issues, we propose two-factor authentication combining password-input and biometric authentication of the hand. We adopt Leap Motion to measure physical and behavioral features related to hands. Subsequently, a random forest classifier determines whether the hand features belongs to a genuine user. Our authentication system architecture completes the biometric authentication by using a limited amount of data obtained within a few seconds when a user enters a password. The advantage of the proposed method is that it prevents intrusion by biometric authentication even if a password is stolen. Our experimental results for 21 testers exhibit 94.98% authentication accuracy in a limited duration, 2.52 s on an average while inputting a password.

    DOI

  • Effectiveness of Usability & Performance Features for Web Credibility Evaluation.

    Kenta Yamada, Hayato Yamana

    2019 IEEE International Conference on Big Data (IEEE BigData)     6257 - 6259  2019  [Refereed]

     View Summary

    Unreliable web pages, such as fake news, have become an unavoidable problem. To tackle this problem, recent researches have adopted both content and social features to predict the credibility of the web pages; however, the accuracy is almost saturated. In this paper, we propose the adoption of Google Lighthouse features to predict web page credibility. Our experimental results show that the proposed method achieves an increased accuracy of 7.9% in comparison with state-of-the-art methods.

    DOI

  • Message from the BITS 2018 General Chairs and TPC Chairs

    Sajal K. Das, Hayato Yamana, General Co-Chairs, Mauro Conti, Atsuko Miyaji, Jun Sakuma

    Proceedings - 2018 IEEE International Conference on Smart Computing, SMARTCOMP 2018     xxiii  2018.07  [Refereed]

    DOI

  • Attribute-based proxy re-encryption method for revocation in cloud storage: Reduction of communication cost at re-encryption

    Yoshiko Yasumura, Hiroki Imabayashi, Hayato Yamana

    2018 IEEE 3rd International Conference on Big Data Analysis, ICBDA 2018     312 - 318  2018.05  [Refereed]

     View Summary

    © 2018 IEEE. In recent years, many users have uploaded data to the cloud for easy storage and sharing with other users. At the same time, security and privacy concerns for the data are growing. Attribute-based encryption (ABE) enables both data security and access control by defining users with attributes so that only those users who have matching attributes can decrypt them. For real-world applications of ABE, revocation of users or their attributes is necessary so that revoked users can no longer decrypt the data. In actual implementations, ABE is used in hybrid with a symmetric encryption scheme such as the advanced encryption standard (AES) where data is encrypted with AES and the AES key is encrypted with ABE. The hybrid encryption scheme requires re-encryption of the data upon revocation to ensure that the revoked users can no longer decrypt that data. To re-encrypt the data, the data owner (DO) must download the data from the cloud, then decrypt, encrypt, and upload the data back to the cloud, resulting in both huge communication costs and computational burden on the DO depending on the size of the data to be re-encrypted. In this paper, we propose an attribute-based proxy re-encryption method in which data can be re-encrypted in the cloud without downloading any data by adopting both ABE and Syalim's encryption scheme. Our proposed scheme reduces the communication cost between the DO and cloud storage. Experimental results show that the proposed method reduces the communication cost by as much as one quarter compared to that of the trivial solution.

    DOI

  • Loop Circuit Optimization with Bootstrapping over Fully Homomorphic Encryption and its Application to Nearest Neighbor

    佐藤宏樹, 馬屋原昂, 石巻優, 山名早人

    日本データベース学会和文論文誌(Web)   16-J   ROMBUNNO.12 (WEB ONLY)  2018.03

    J-GLOBAL

  • Realization of Active Authentication for Smart Phone by Using Online Learning

    石山雄大, 山名早人, 山名早人

    日本データベース学会和文論文誌(Web)   16-J   ROMBUNNO.18 (WEB ONLY)  2018.03

    J-GLOBAL

  • ShuttleBoard:スマートウォッチにおけるタップ動作の少ない仮名文字入力手法

    下岡純也, 山名早人

    日本データベース学会和文論文誌(Web)   16-J   ROMBUNNO.5 (WEB ONLY)  2018.03

    J-GLOBAL

  • History-enhanced Focused Website Segment Crawler.

    Tanaphol Suebchua, Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    2018 International Conference on Information Networking(ICOIN)   2018-January   80 - 85  2018  [Refereed]

     View Summary

    The primary challenge in focused crawling research is how to efficiently utilize computing resources, e.g., bandwidth, disk space, and time, to find as many web pages related to a specific topic as possible. To meet this challenge, we previously introduced a machine-learning-based focused crawler that aims to crawl a group of relevant web pages located in the same directory path, called a website segment, and has achieved high efficiency so far. One of the limitations of our previous approach is that it may repeatedly visit a website that does not serve any relevant website segments, in the scenario where the website segments share the same linkage characteristics as the relevant ones in the training dataset. In this paper, we propose a "history-enhanced focused website segment crawler" to solve the problem. The idea behind it is that the priority score of an unvisited website segment should be reduced if the crawler has consecutively downloaded many irrelevant web pages from the website. To implement this idea, we propose a new prediction feature, called the "history feature", that is extracted from the recent crawling results, i.e., relevant and irrelevant web pages gathered from the target website. Our experiment shows that our newly proposed feature could improve the crawling efficiency of our focused crawler by a maximum of approximately 5%.

    DOI

  • Efficient Topical Focused Crawling Through Neighborhood Feature.

    Tanaphol Suebchua, Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    New Generation Computing   36 ( 2 ) 95 - 118  2018  [Refereed]

     View Summary

    A focused web crawler is an essential tool for gathering domain-specific data used by national web corpora, vertical search engines, and so on, since it is more efficient than general Breadth-First or Depth-First crawlers. The problem in focused crawling research is the prioritization of unvisited web pages in the crawling frontier followed by crawling these web pages in the order of their priority. The most common feature, adopted in many focused crawling researches, to prioritize an unvisited web page is the relevancy of the set of its source web pages, i.e., its in-linked web pages. However, this feature is limited, because we cannot estimate the relevancy of the unvisited web page correctly if we have few source web pages. To solve this problem and enhance the efficiency of focused web crawlers, we propose a new feature, called the "neighborhood feature". This enables the adoption of additional already-downloaded web pages to estimate the priority of a target web page. The additionally adopted web pages consist both of web pages located at the same directory as that of the target web page and web pages whose directory paths are similar to that of the target web page. Our experimental results show that our enhanced focused crawlers outperform the crawlers not utilizing the neighborhood feature as well as the state-of-the-art focused crawlers, including HMM crawler.

    DOI

  • Editor's Message to Special Issue of Young Researchers' Papers.

    Hayato Yamana

    Journal of Information Processing   26   224 - 224  2018  [Refereed]

    DOI

  • Outsourced Private Set Intersection Cardinality with Fully Homomorphic Encryption.

    Arisa Tajima, Hiroki Sato, Hayato Yamana

    6th International Conference on Multimedia Computing and Systems(ICMCS)   2018-May   1 - 8  2018  [Refereed]

     View Summary

    Cloud database services have attracted considerable interest with the increase in the amount of data to be analyzed. Delegating data management to cloud services, however, causes security and privacy issues because cloud services are not always trustable. In this study, we address the problem of answering join queries across outsourced private datasets while maintaining data confidentiality. We particularly consider a scenario in which two data owners each own a set of elements and a querier asks the cloud to perform join operations to obtain the size of the common elements in the two datasets. To process the join operations without revealing the contents of data to the cloud, we propose two protocols, a basic protocol and a querier-friendly protocol, which adopt a functionality of outsourced private set intersection cardinality (OPSI-CA) with fully homomorphic encryption (FHE) and bloom filters. The querier-friendly protocol achieves a reduction in communication and computation costs for the querier. Our experimental results show that it takes 436 s for the basic protocol and 298 s for the querier-friendly protocol to execute the join query on the two datasets with 100 elements each. The novelty of this study is that our protocols are the first approaches for outsourced join operations adopting FHE.

    DOI

  • Active Authentication on Smartphone using Touch Pressure.

    Masashi Kudo, Hayato Yamana

    The 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings     96 - 98  2018  [Refereed]

     View Summary

    Smartphone user authentication is still an open challenge because the balance between both security and usability is indispensable. To balance between them, active authentication is one way to overcome the problem. In this paper, we tackle to improve the accuracy of active authentication by adopting online learning with touch pressure. In recent years, it becomes easy to use the smartphones equipped with pressure sensor so that we have confirmed the effectiveness of adopting the touch pressure as one of the features to authenticate. Our experiments adopting online AROW algorithm with touch pressure show that equal error rate (EER), where the miss rate and false rate are equal, is reduced up to one-fifth by adding touch pressure feature. Moreover, we have confirmed that training with the data from both sitting posture and prone posture archives the best when testing variety of postures including sitting, standing and prone, which achieves EER up to 0.14%.

    DOI

  • Non-Interactive and Fully Output Expressive Private Comparison.

    Yu Ishimaki, Hayato Yamana

    Progress in Cryptology - INDOCRYPT 2018 - 19th International Conference on Cryptology in India(INDOCRYPT)   11356 LNCS   355 - 374  2018  [Refereed]

     View Summary

    © 2018, Springer Nature Switzerland AG. Private comparison protocols are fundamental to the field of secure computation. Recently, Lu et al. (ASIACCS 2018) proposed a new protocol, XCMP,, which is based on a ring-based fully homomorphic encryption (FHE) scheme. In that scheme, two μ-bit integers a and b are compared in encrypted form without revealing the plaintext to an evaluator. The protocol outputs a bit in encrypted form, which indicates whether a > b. XCMP has the following three advantages: the output can be reused for further processing, the evaluation is performed without any interactions with a decryptor having a secret key, and the required multiplicative depth is only 1. However, XCMP has two potential disadvantages. First, the protocol result preserves both additive and multiplicative homomorphisms over ℤ t only, whereas the underlying FHE scheme can support a much larger plaintext space of (Formula Presented) for a prime t and a power-of-two N; this restricts the functionality of applications using the comparison result. Second, the bit length μ of the integers to be compared is no more than log N (typically 16 bits, at most). Thus, it is difficult for XCMP to handle larger integers. In this paper, we propose a non-interactive private comparison protocol that solves the aforementioned problems and outputs an additively and multiplicatively reusable comparison result over the ring without adding an extremely large computational overhead over XCMP. Moreover, by regarding a μ (>16 -bit integer as a sequence of chunks, we show that the multiplicative depth required for our comparison protocol is logarithmic in the number of chunks. This value is much smaller than the naïve solution with a multiplicative depth of log μ. Experiment results demonstrate that our protocol introduces a subtle overhead over XCMP. Remarkably, we experimentally demonstrate that our protocol for a larger domain is comparable to the construction given by one of the state-of-the-art bitwise FHE schemes.

    DOI

  • External Content-dependent Features for Web Credibility Evaluation.

    Kazuyoshi Ootani, Hayato Yamana

    IEEE International Conference on Big Data (IEEE BigData 2018)     5414 - 5416  2018  [Refereed]

     View Summary

    Unreliable web pages such as fake news has become a global problem in big data era. The motivation to publish fake news is often for profit; for example, earning advertisement income by putting ads on their web pages. In this paper, we focus on different usage of HTML source tags between reliable and unreliable web pages, then propose new features for predicting their credibility. The experimental result shows that our proposed features increase accuracy when used together with previously proposed Contents features.

    DOI

  • Improving Recommendation Diversity Across Users by Reducing Frequently Recommended Items.

    Seiki Miyamoto, Takumi Zamami, Hayato Yamana

    IEEE International Conference on Big Data (IEEE BigData 2018)     5392 - 5394  2018  [Refereed]

     View Summary

    Recommender systems have been used for analyzing users' preference through their past activities and recommend items in which they might be interested in. There are numerous research on improving the accuracy of recommendation being conducted, so the recommender system reads user preference more accurately. However, it is important to consider the recommendation diversity, because lacking diversity will lead to recommendation being repetitive and obvious. In this paper, we propose a method to re-rank the recommendation list by appearance frequency of items to recommend more range of items. The experimental result shows that our method consistently performs better than a related work to improve recommendation diversity.

    DOI

  • A survey on recommendation methods beyond accuracy

    H. A.N. Jungkyu, Hayato Yamana

    IEICE Transactions on Information and Systems   E100D ( 12 ) 2931 - 2944  2017.12  [Refereed]

     View Summary

    Copyright © 2017 The Institute of Electronics, Information and Communication Engineers. In recommending to another individual an item that one loves, accuracy is important, however in most cases, focusing only on accuracy generates less satisfactory recommendations. Studies have repeatedly pointed out that aspects that go beyond accuracy—such as the diversity and novelty of the recommended items—are as important as accuracy in making a satisfactory recommendation. Despite their importance, there is no global consensus about definitions and evaluations regarding beyond-accuracy aspects, as such aspects closely relate to the subjective sensibility of user satisfaction. In addition, devising algorithms for this purpose is difficult, because algorithms concurrently pursue the aspects in trade-off relation (i.e., accuracy vs. novelty). In the aforementioned situation, for researchers initiating a study in this domain, it is important to obtain a systematically integrated view of the domain. This paper reports the results of a survey of about 70 studies published over the last 15 years, each of which addresses recommendations that consider beyond-accuracy aspects. From this survey, we identify diversity, novelty, and coverage as important aspects in achieving serendipity and popularity unbiasedness—factors that are important to user satisfaction and business profits, respectively. The five major groups of algorithms that tackle the beyond-accuracy aspects are multi-objective, modified collaborative filtering (CF), clustering, graph, and hybrid; we then classify and describe algorithms as per this typology. The off-line evaluation metrics and user studies carried out by the studies are also described. Based on the survey results, we assert that there is a lot of room for research in the domain. Especially, personalization and generalization are considered important issues that should be addressed in future research (e.g., automatic per-user-trade-off among the aspects, and properly establishing beyond-accuracy aspects for various types of applications or algorithms).

    DOI

  • A Survey on Recommendation Methods Beyond Accuracy.

    Jungkyu Han, Hayato Yamana

    IEICE Transactions on Information & Systems   100-D ( 12 ) 2931 - 2944  2017.12  [Refereed]

     View Summary

    In recommending to another individual an item that one loves, accuracy is important, however in most cases, focusing only on accuracy generates less satisfactory recommendations. Studies have repeatedly pointed out that aspects that go beyond accuracy-such as the diversity and novelty of the recommended items-are as important as accuracy in making a satisfactory recommendation. Despite their importance, there is no global consensus about definitions and evaluations regarding beyond-accuracy aspects, as such aspects closely relate to the subjective sensibility of user satisfaction. In addition, devising algorithms for this purpose is difficult, because algorithms concurrently pursue the aspects in trade-off relation (i.e., accuracy vs. novelty). In the aforementioned situation, for researchers initiating a study in this domain, it is important to obtain a systematically integrated view of the domain. This paper reports the results of a survey of about 70 studies published over the last 15 years, each of which addresses recommendations that consider beyond-accuracy aspects. From this survey, we identify diversity, novelty, and coverage as important aspects in achieving serendipity and popularity unbiasedness-factors that are important to user satisfaction and business profits, respectively. The five major groups of algorithms that tackle the beyond-accuracy aspects are multi-objective, modified collaborative filtering (CF), clustering, graph, and hybrid; we then classify and describe algorithms as per this typology. The off-line evaluation metrics and user studies carried out by the studies are also described. Based on the survey results, we assert that there is a lot of room for research in the domain. Especially, personalization and generalization are considered important issues that should be addressed in future research (e.g., automatic per-user-trade-off among the aspects, and properly establishing beyond-accuracy aspects for various types of applications or algorithms).

    DOI

  • Bits Message from General Co-Chairs

    Sajal K. Das, Hayato Yamana

    2017 IEEE International Conference on Smart Computing, SMARTCOMP 2017     xxiii  2017.06  [Refereed]

    DOI

  • Streamline Computation of Secure Frequent Pattern Mining by Fully Homomorphic Encryption

    今林広樹, 石巻優, 馬屋原昂, 佐藤宏樹, 山名早人

    情報処理学会論文誌トランザクション データベース(Web)   10 ( 1 ) 1 - 12  2017.03

    CiNii J-GLOBAL

  • Private Substring Search on Homomorphically Encrypted Data

    Yu Ishimaki, Hiroki Imabayashi, Hayato Yamana

    2017 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP)     457 - 462  2017  [Refereed]

     View Summary

    With the rapid development of cloud storage services and IoT environment, how to securely and efficiently search without compromising privacy has been an indispensable problem. In order to address such a problem, much works have been proposed for searching over encrypted data. Motivated by storing sensitive data such as genomic and medical data, substring search for encrypted data has been studied. Previous work either leaks query access pattern using vulnerable cryptographic model or performs search over plaintext data by an encrypted query. Thus they are not compatible with outsourcing scenario where searched data is stored in encrypted form which is searched by an encrypted substring query without leaking query access pattern, i.e., private substring search. In order to perform private substring search, Fully Homomorphic Encryption (FHE) can be adopted although it induces computationally huge overhead. Because of the huge overhead, performing private substring search efficiently over FHE is a challenging task. In this work, we propose a private substring search protocol over encrypted data by adopting FHE followed by examining its feasibility. In particular, we make use of batching technique which can accelerate homomorphic computation in SIMD manner. In addition, we propose a data structure which can be useful to specific searching function for batched computation. Our experimental result showed our proposed method is feasible.

    DOI

  • Geographical Diversification in POI Recommendation: Toward Improved Coverage on Interested Areas

    Jungkyu Han, Hayato Yamana

    PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17)     224 - 228  2017  [Refereed]

     View Summary

    In recommending POIs(Point-Of-Interests), factors such as the diversity of the recommended POIs are as important as accuracy for providing a satisfactory recommendation. Although existing diversification methods can help POI recommender systems suggest more diverse POIs, they lack "geographical diversification," which results in the concentration of the supposedly "diverse" recommended POIs on "a small portion" in areas where the target-user is most active. This is caused by the neglect of POI locations in the diversification, i.e., existing diversification methods try to diversify the categories of recommended items. However, geographical diversification is essential for users whose activity interests comprise many sub-areas and who require a variety of recommended POIs encompassing all their activity interests. In this paper, we propose a novel proportional geographical diversification method that recommends a variety of POIs located in the activity district of a user such that the variety of sub-areas in the district is proportional to the frequency of his/her activity in each sub-area. We compare the performance of the proposed method with existing diversification methods using real datasets. The evaluation result shows that no method except the proposed one can significantly increase geographical diversity at the expense of tolerable accuracy loss.

    DOI

  • Virtual co-eating: Making solitary eating experience more enjoyable

    Takahashi, M., Tanaka, H., Yamana, H., Nakajima, T.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   10507 LNCS   460 - 464  2017  [Refereed]

     View Summary

    Recently, a research on eating habits of Japanese college students revealed that they have a highly desire to communicate with others through co-eating. Even though better eating experience through co-eating is important, they often tend to be alone even more because of some reasons like small households, living alone, and having no time to find others for co-eating. Therefore, we believe that it may improve eating experience by incorporating a fictional character into the real space as a partner to eat together. For validating the idea, we have developed a virtual co-eating system for solving issues caused from solitary eating, and show some insights from its user study.

    DOI

  • A Variable-Length Motifs Discovery Method in Time Series using Hybrid Approach

    Chaw Thet Zan, Hayato Yamana

    19TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2017)     49 - 57  2017  [Refereed]

     View Summary

    Discovery of repeated patterns, known as motifs, from long time series is essential for providing hidden knowledge to real-world applications like medical, financial and weather analysis. Motifs can be discovered on raw time series directly or on their transformed abstract representation alternatively. Most of time series motif discovery methods require predefined motif length, which results in long execution time because we have to vary the length to discover motifs with different lengths. To solve the problem, we propose an efficient method for discovering variable length motifs in combination of approximate method with exact verification. First, symbolic representation is adopted to discover motifs roughly followed by exact examination of the found motifs with original real-valued data to achieve fast and exact discovery. The experiments show that our proposed method successfully discovered significant motifs efficiently in comparison with state-of-the-art methods: MK and SBF.

    DOI

  • Securing Big Data and IoT Networks in Smart Cyber-Physical Environments

    Sajal K. Das, Hayato Yamana

    2017 INTERNATIONAL CONFERENCE ON SMART DIGITAL ENVIRONMENT (ICSDE'17)   Part F130526   189 - 194  2017  [Refereed]

     View Summary

    This position paper highlights security and privacy issues in smart environments based on cyber-physical systems. It also summarizes some of our recent research activities and projects in this area.

    DOI

  • Attribute-based Proxy Re-encryption Method for Revocation in Cloud Data Storage.

    Yoshiko Yasumura, Hiroki Imabayashi, Hayato Yamana

    2017 IEEE International Conference on Big Data (IEEE BigData 2017)   2018-January   4858 - 4860  2017  [Refereed]

     View Summary

    In the big data era, many users upload data to cloud while security concerns are growing. By using attribute-based encryption (ABE), users can securely store data in cloud while exerting access control over it. Revocation is necessary for real-world applications of ABE so that revoked users can no longer decrypt data. In actual implementations, however, revocation requires re-encryption of data in client side through download, decrypt, encrypt, and upload, which results in huge communication cost between the client and the cloud depending on the data size. In this paper, we propose a new method where the data can be re-encrypted in cloud without downloading any data. The experimental result showed that our method reduces the communication cost by one quarter in comparison with the trivial solution where re-encryption is performed in client side.

    DOI

  • MCMalloc: A Scalable Memory Allocator for Multithreaded Applications on a Many-core Shared-memory Machine.

    Akira Umayabara, Hayato Yamana

    2017 IEEE International Conference on Big Data (IEEE BigData 2017)   2018-January   4846 - 4848  2017  [Refereed]

     View Summary

    In the big data era, multithreaded processing on a many-core machine, whose core number is still increasing, has become essential to parallelize the execution of big data applications, besides distributed computing. In such a machine, malloc-intensive applications cannot scale due to lock contentions among threads, which becomes worse as the number of threads increases. To solve the problem, we propose a new method to reduce lock contentions by batch malloc, pseudo free, and fine-grained data-locking. Experimental result shows 4.72 times speed-up in comparison with JEmalloc which is the fastest memory allocator among previous ones.

    DOI

  • Familiarity-aware POI Recommendation in Urban Neighborhoods.

    Jungkyu Han, Hayato Yamana

    Journal of Information Processing   25   386 - 396  2017  [Refereed]

     View Summary

    © 2017 Information Processing Society of Japan. Users’ visiting patterns to POIs (Points-Of-Interest) varied with regard to the users’ familiarity with their visited areas. For instance, users visit tourist sites in unfamiliar cities rather than in their familiar home city. Previous studies have shown that familiarity can improve POI recommendation performance. However, such studies have focused on the differences between home and other cities, and not among small urban neighborhoods in the same city where user activities frequently occur. Applying the studies directly to the areas is difficult because simple distance-based familiarity measures, or visit-pattern differences represented on topics, groups of POIs that share common functions such as Arts, French restaurants, are too coarse for capturing the differences observed among different areas. In the urban neighborhoods in the same city, user visit-pattern differences originate from more precise POI levels. In order to extend the previously proposed familiarity-aware POI recommendation to be adopted in different areas in the same city, we propose a method that employs visit-frequency-based familiarity and precise POI level of visit-pattern differentiation. In experiments on real LBSN data consists of over 800,000 check-ins for three cities: NYC, LA, and Tokyo, our proposed method outperforms state-of-the-art methods by 0.05 to 0.06 in Recall@20 metric.

    DOI

  • Dynamic SAX Parameter Estimation for Time Series.

    Chaw Thet Zan, Hayato Yamana

    International Journal of Web Information Systems   13 ( 4 ) 387 - 404  2017  [Refereed]

     View Summary

    Purpose - The paper aims to estimate the segment size and alphabet size of Symbolic Aggregate approXimation (SAX). In SAX, time series data are divided into a set of equal-sized segments. Each segment is represented by its mean value and mapped with an alphabet, where the number of adopted symbols is called alphabet size. Both parameters control data compression ratio and accuracy of time series mining tasks. Besides, optimal parameters selection highly depends on different application and data sets. In fact, these parameters are iteratively selected by analyzing entire data sets, which limits handling of the huge amount of time series and reduces the applicability of SAX.Design/methodology/approach - The segment size is estimated based on Shannon sampling theorem (autoSAXSD_S) and adaptive hierarchical segmentation (autoSAXSD_M). As for the alphabet size, it is focused on how mean values of all the segments are distributed. The small number of alphabet size is set for large distribution to easily distinguish the difference among segments.Findings - Experimental evaluation using University of California Riverside (UCR) data sets shows that the proposed schemes are able to select the parameters well with high classification accuracy and show comparable efficiency in comparison with state-of-the-art methods, SAX and auto_iSAX.Originality/value - The originality of this paper is the way to find out the optimal parameters of SAX using the proposed estimation schemes. The first parameter segment size is automatically estimated on two approaches and the second parameter alphabet size is estimated on the most frequent average (mean) value among segments.

    DOI

  • An improved symbolic aggregate approximation distance measure based on its statistical features

    Chaw Thet Zan, Hayato Yamana

    ACM International Conference Proceeding Series     72 - 80  2016.11  [Refereed]

     View Summary

    © 2016 ACM. The challenges in effcient data representation and similarity measures on massive amounts of time series have enormous impact on many applications. This paper addresses an improvement on Symbolic Aggregate approXimation (SAX), is one of the effcient representations for time series mining. Because SAX represents its symbols by the average (mean) value of a segment with the assumption of Gaussian distribution, it is insuficient to serve the entire deterministic information and causes sometimes incorrect results in time series classiffcation. In this work, SAX representation and distance measure is improved with the addition of another moment of the prior distribution, standard deviation; SAX SD is proposed. We provide comprehensive analysis for the proposed SAX SD and conrm both the highest classi-fication accuracy and the highest dimensionality reduction ratio on University of California, Riverside (UCR) datasets in comparison to state of the art methods such as SAX, Extended SAX (ESAX) and SAX Trend Distance (SAX TD).

    DOI

  • 早稲田大学のICT活用 : 過去・現在,そして未来へ (ICT活用の新段階)

    山名 早人

    IDE : 現代の高等教育   ( 585 ) 11 - 16  2016.11

    CiNii

  • Message from the MAW 2016 Symposium Organizers

    Takahiro Hara, Kin Fun Li, Shengrui Wang, Hayato Yamana

    Proceedings - IEEE 30th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2016     lviii  2016.05

    DOI

  • Nowcasting Economic Indicators by Analyzing the Diet Proceedings

    高杉亮介, 山名早人, 山名早人

    日本データベース学会和文論文誌(Web)   14-J  2016

    J-GLOBAL

  • What is your Mother Tongue?: Improving Chinese Native Language Identification by Cleaning Noisy Data and Adopting BM25

    Lan Wang, Masahiro Tanaka, Hayato Yamana

    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA)     42 - 47  2016  [Refereed]

     View Summary

    Native language identification (NLI) is a process by which an author's native language can be identified from essays written in the second language of the author. In this work, a supervised model is built to accomplish this based on a Chinese learner corpus. In the NLI field, this is the first work to (1) eliminate noisy data automatically before the training phase and (2) employ a BM25 term weighting technique to score each feature. We also adopt a hierarchical structure of linear support vector machine classifiers to achieve high accuracy and a state-of-the-art accuracy of 77.1%, which is greater than those of other Chinese NLI methods by over 10%.

    DOI

  • Identifying protein short linear motifs by position-specific scoring matrix

    Fang, C., Noguchi, T., Yamana, H., Sun, F.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9713 LNCS   206 - 214  2016  [Refereed]

     View Summary

    Short linear motifs (SLiMs) play a central role in several biological functions, such as cell regulation, scaffolding, cell signaling, post-translational modification, and cleavage. Identifying SLiMs is an important step for understanding their functions and mechanism. Due to their short length and particular properties, discovery of SLiMs in proteins is a challenge both experimentally and computationally. So far, many existing computational methods adopted many predicted sequence or structures features as input for prediction, there is no report about using position-specific scoring matrix (PSSM) profiles of proteins directly for SLiMs prediction. In this study, we describe a simple method, named as PSSMpred, which only use the evolutionary information generated in form of PSSM profiles of protein sequences for SLiMs prediction. When comparing with other methods tested on the same datasets, PSSMpred achieves the best performances: (1) achieving 0.03-0.1 higher AUC than other methods when tested on HumanTest151; (2) achieving 0.03-0.05 and 0.03-0.06 higher AUC than other methods when tested on ANCHOR-short and ANCHOR-long respectively.

    DOI

  • Adaptive Focused Website Segment Crawler

    Tanaphol Suebchua, Arnon Rungsawang, Hayato Yamana

    PROCEEDINGS OF 2016 19TH INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS (NBIS)     181 - 187  2016  [Refereed]

     View Summary

    Focused web crawler has become indispensable for vertical search engines that provide a search service for specialized datasets. These vertical search engines have to collect specific web pages in the web space, whereas search engines such as Google and Bing gather web pages from all over the world. The problem in focused crawling research is how to collect specific web pages with minimal computing resources. We previously addressed this problem by proposing a focused crawling strategy, which utilizes an ensemble machine learning classifier to find the group of relevant web pages, referred to as relevant website segment. In this paper, we enhance the proposed crawler as follows: 1) We increase the accuracy of predicting website segments, by preparing two predictors: a predictor learned by features extracted from relevant source website segments and another predictor learned by features from irrelevant ones. The idea is that there may exist different characteristics between these two types of source website segments. 2) We also propose a noisy data elimination method when updating the predictor incrementally during the crawling process. A preliminary experiment shows that our enhanced crawler outperforms a crawler that equips neither of these approaches by around 12%, at most.

    DOI

  • Secure frequent pattern mining by fully homomorphic encryption with ciphertext packing

    Imabayashi, H., Ishimaki, Y., Umayabara, A., Sato, H., Yamana, H.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9963 LNCS   181 - 195  2016  [Refereed]

     View Summary

    We propose an efficient and secure frequent pattern mining protocol with fully homomorphic encryption (FHE). Nowadays, secure outsourcing of mining tasks to the cloud with FHE is gaining attentions. However, FHE execution leads to significant time and space complexities. P3CC, the first proposed secure protocol with FHE for frequent pattern mining, has these particular problems. It generates ciphertexts for each component in item-transaction data matrix, and executes numerous operations over the encrypted components. To address this issue, we propose efficient frequent pattern mining with ciphertext packing. By adopting the packing method, our scheme will require fewer ciphertexts and associated operations than P3CC, thus reducing both encryption and calculation times. We have also optimized its implementation by reusing previously produced results so as not to repeat calculations. Our experimental evaluation shows that the proposed scheme runs 430 times faster than P3CC, and uses 94.7% less memory with 10,000 transactions data.

    DOI

  • Privacy-Preserving String Search for Genome Sequences with FHE bootstrapping optimization

    Yu Ishimaki, Hiroki Imabavashi, Kana Shimizu, Hayato Yamana

    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)     3989 - 3991  2016  [Refereed]

     View Summary

    Privacy-preserving string search is a crucial task for analyzing genomics-driven big data. In this work, we propose a cryptographic protocol that uses Fully Homomorphic Encryption (FHE) to enable a client to search on a genome sequence database without leaking his/her query to the server. Though FHE supports both addition and multiplication over encrypted data, random noise inside ciphertexts grows with every arithmetic operation especially multiplication, which results in incorrect decryption when the noise amount exceeds its threshold called level. There are two approaches to avoid the incorrect decryption: one is setting the sufficient level that assures correct decryption within the limited number of operations, and the other is resetting the noise by the method called bootstrapping. It is important to find an optimal balance between overhead caused by the level and overhead caused by the bootstrapping, since using higher level deteriorates the performance of all the arithmetic operations, while the more number of bootstrappings causes more expensive overhead. In this study, we propose an efficient approach to minimize the number of bootstrappings while reducing the level as much as possible. Our experimental result shows that it runs at most 10 times faster than a naive approach.

    DOI

  • Fast and Space-Efficient Secure Frequent Pattern Mining by FHE

    Hiroki Imabayashi, Yu Ishimaki, Akira Umayabara, Hayato Yamana

    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)     3983 - 3985  2016  [Refereed]

     View Summary

    In the big data era, security and privacy concerns are growing. One of the big challenges is secure Frequent Pattern Mining (FPM) over Fully Homomorphic Encryption (FHE). There exist some research efforts aimed at speeding-up, however, we have a big room so as to decrease time and space complexity. Apriori over FHE, in particular, generates a large number of ciphertexts during the support calculation, which results in both large time and space complexity. To solve it, we proposed a speed-up technique, around 430 times faster and 18.9 times smaller memory usage than the state-of-the-art method, by adopting both packing and caching mechanism. In this paper, we further propose to decrease the memory space used for caching. Our goal is to discard redundant cached ciphertexts without increasing the execution time. Our experimental results show that our method decreases the memory usage by 6.09% at most in comparison with our previous method without increasing the execution time.

    DOI

  • A study on individual mobility patterns based on individuals’ familiarity to visited areas

    Han, J., Yamana, H.

    International Journal of Pervasive Computing and Communications   12 ( 1 ) 23 - 48  2016  [Refereed]

     View Summary

    Purpose - The purpose of this paper is to clarify the correlations between amount of individual's knowledge of a specific area and his/her visit pattern to point of interest (POI, interested places) located in the area.Design/methodology/approach - This paper proposes a visit-frequency-based familiarity estimation method that estimates individuals' knowledge of areas in a quantitative manner. Based on the familiarity degree, individuals' visit logs to POIs are divided into a set of groups followed by analyzing the differences among the groups from various points of view, such as user preference, POI categories/popularity, visit time/date and subsequent visits.Findings - Existence of statistically significant correlations between individuals' familiarity to areas and their visit patterns is observed by our analysis using 1.4-million POI visit logs collected from a popular location-based social network (LBSN), Foursquare. There exist different skewness of the visit time and visited POI distribution/popularity with regard to the familiarity. For instance, users go to unfamiliar areas on weekends and visit POIs for cultural experiences, such as museums. Anotable point is that the correlations can be detected even in the areas in home city, which have not been known so far.Originality/value - This is the first in-depth work that studies both estimation of individuals' familiarity and correlations between the familiarity and individuals' mobility patterns by analyzing massive LBSN data. The methodologies used and the findings of this work can be applicable not only to human mobility analysis for sociology, but also to POI recommendation system design.

    DOI

  • Why people go to unfamiliar areas?: Analysis of mobility pattern based on users' familiarity

    Jungkyu Han, Hayato Yamana

    17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings    2015.12  [Refereed]

     View Summary

    Human mobility analysis with Location-Based Social Network (LBSN) data is the basis of personalized point-of-interest (POI) recommendations or location-aware advertisements. In addition to personal preference and spatiotemporal factors such as time and distance, personal context has a strong influence on mobility. An individual's familiarity with an area is an interesting context because it can bias the influence of certain factors. For example, the mobility patterns of two persons who have similar preferences are different when their familiarity with the area is different, even in the same area. In this paper, we analyze familiarity's effect on mobility patterns by using over 1.4 million check-ins gathered from Foursquare. The analysis indicates that there is a skewness of the visit time and visited venue distribution in unfamiliar areas. For instance, people go to unfamiliar areas on weekends
    and venues for cultural experiences, such as museums, strongly contribute to the motivation of visit.

    DOI

  • ビッグデータがもたらす超情報社会―すべてを視る情報処理技術:基盤から応用まで―2 ビッグデータ関連プログラム―米国とEUにおける動向―

    YAMANA HAYATO

    情報処理   56 ( 10 ) 962 - 967  2015.09

    J-GLOBAL

  • Cross-lingual Investigation of User Evaluations for Global Restaurants

    LE Jiawen, YAMANA Hayato

    IMT   10 ( 2 ) 317 - 322  2015

     View Summary

    Twitter, as one of the popular social network services, is now widely used to query public opinions. In this paper, tweets, along with the reviews collected from review websites are used to carry out sentimental analysis, so as to figure out the language-based and location-based effects on user evaluations for six global restaurants. The language expansion is carried out that 34 languages are taken into account. By using a range of new and standard features, a series of classifiers are trained and applied in the later steps of sentiment analysis. Our experimental results show that the location and language effects on user evaluations for restaurants actually exist.

    DOI CiNii

  • Detecting Learner's To-Be-Forgotten Items using Online Handwritten Data.

    Hiroki Asai, Hayato Yamana

    Proceedings of the 15th New Zealand Conference on Human-Computer Interaction, CHINZ 2015, Hamilton, New Zealand, September 3-4, 2015     17 - 20  2015  [Refereed]

     View Summary

    An effective learning system is indispensable for human beings with a limited life span. Traditional learning systems schedule repetition based on both the results of a recall test and learning theories such as the spacing effect. However, there is room for improvement from the perspective of remembrance-level estimation. In this paper, we focus on on-line handwritten data obtained from handwriting using a computer. We collected handwritten data from remembrance tests to both analyze the problem of traditional estimation methods and to build a new estimation model using handwritten data as the input data. The evaluation found that our proposed model can output a continuous remembrance-level value of zero to 1, whereas traditional methods output a only binary decision. In addition, the experiment showed that our proposed model achieves the best performance with an F-value of 0.69.

    DOI

  • Predicting Various Types of User Attributes in Twitter by Using Personalized PageRank

    Kazuya Uesato, Hiroki Asai, Hayato Yamana

    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA     2825 - 2827  2015  [Refereed]

     View Summary

    Predicting various types of user-attributes in social networks has become indispensable for personalizing applications since there are many non-disclosed attributes in social networks. However, extracted attributes in existing works are limited to pre-defined types of attributes, which results in no extraction of unexpected-types of attributes. In this paper, we therefore propose a novel method that extracts various, i.e., unlimited, types of attributes by adopting personalized PageRank to a large social network. The experimental results using over 7.9 million of Japanese twitter-users show that our proposed method successfully extracts four types of attributes per-user in average with 0.841 of MAP@20.

    DOI

  • Condensing position-specific scoring matrixs by the Kidera factors for ligand-binding site prediction

    Fang, C., Noguchi, T., Yamana, H.

    International Journal of Data Mining and Bioinformatics   12 ( 1 ) 70 - 84  2015  [Refereed]

     View Summary

    protein functional sites. However, it is 20-dimentional and contains many redundant features. The Kidera factors were reported to contain information relating almost all physical properties of amino acids, but it requires appropriate weighting coefficients to express their properties. We developed a novel method, named as KSPSSMpred, which integrated PSSM and the Kidera Factors into a 10-dimensional matrix (KSPSSM) for ligand-binding site prediction. Flavin adenine dinucleotide (FAD) was chosen as a representative ligand for this study. When compared with five other feature-based methods on a benchmark dataset, KSPSSMpred performed the best. This study demonstrates that, KSPSSM is an effective feature extraction method which can enrich PSSM with information relating 188 physical properties of residues, and reduce 50% feature dimensions without losing information included in the PSSM.

    DOI PubMed

  • A Recognition Model of Selected Regions Indicated by Handwritten Annotations on Electronic Documents

    ASAI HIROKI, YAMANA HAYATO

    情報処理学会論文誌トランザクション データベース(Web)   7 ( 4 ) 1 - 12  2014.12

     View Summary

    Handwriting annotation on paper-based documents is widely performed for both appending information and emphasizing a part of the document. When we perform it on electronic documents using a computer, there are some problems about improving availability such as searching and sharing by using these annotation information. We have to estimate where is annotated on the document to solve the problem. However, the accuracy of traditional methods indicate insufficient recognition accuracy because they proposed heuristic method ignoring human habit of annotations. In this paper, we therefore propose a recognition model of handwriting targeting annotations that is important to solve these problems. Our recognition model enables to detect common targeting annotation by users such as underline, enclosure and vertical. Our user study found that the proposed model can estimate selected region for 85% on average in the selection of characters and for 91% in the selection of text lines.

    CiNii J-GLOBAL

  • Intelligent ink annotation framework that uses user's intention in electronic document annotation

    Hiroki Asai, Hayato Yamana

    ITS 2014 - Proceedings of the 2014 ACM International Conference on Interactive Tabletops and Surfaces     333 - 338  2014.11  [Refereed]

     View Summary

    Annotating documents is one of the indispensable interaction between human and documents. The annotation system of electronic documents enables to implement effective functions, such as information retrieval and annotation-based navigation, by using the annotation information
    however, traditional systems require users to perform gestures in addition to common gestures for paper-based documents. This can reduce "learnability" of the system. We propose an intelligent ink annotation framework that helps the system to increase the learnability of annotation systems by detecting recognizable intentions from natural annotation behavior on paper-based documents. Our framework recognizes "Targeting Content" and "Commenting," which are related to extraction of annotation information. We have developed a prototype annotation system using our proposed framework and conducted a user study to identify future direction.

    DOI

  • Twitter User Profile Estimation from Mention Information

    OKUTANI TAKASHI, YAMANA HAYATO, YAMANA HAYATO

    日本データベース学会和文論文誌   13 ( 1 ) 1 - 6  2014.10

    CiNii J-GLOBAL

  • An Image Blur Alert System for Mobile Device Cameras

    TEZUKA SHOTA, ASAI HIROKI, YAMANA HAYATO, YAMANA HAYATO

    日本データベース学会和文論文誌   13 ( 1 ) 58 - 63  2014.10

    CiNii J-GLOBAL

  • マルチコアCPU環境における低レイテンシデータストリーム処理

    上田 高徳, 秋岡 明香, 山名 早人

    情報・システムソサイエティ誌   19 ( 3 ) 14 - 14  2014

    DOI CiNii

  • Analyzing conservation patterns and its influence on identifying protein functional

    Chun Fang, Tamotsu Noguchi, Hayato Yamana

    Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014     73 - 79  2014.01

     View Summary

    Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been adopted by almost all sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved to some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. All existing studies used the direct output of PSSM for functional sites prediction, without considering the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of analyzing conservation patterns, three PSSM-based methods for identifying 3 kinds of functional sites have been compared. Experiment results show that, although all the methods are based on the same feature - PSSM of protein sequence, they are competent in identifying different patterns of functional sites: the PSSM-based method is competent for identifying functional site which is independently conserved; the smoothed-PSSM is competent for identifying functional site which is continuous conserved; and the masked-smoothed- PSSM based method is competent for identifying functional site which is mingled with intensively highly flexible and highly conserved residues. Copyright © (2014) by the International Society for Computers and Their Applications.

  • Image Annotation Fusing Content-based and Tag-based Technique Using Support Vector Machine and Vector Space Model

    Shan-Bin Chan, Hayato Yamana, Duy-Dinh Le, Shin'ichi Satoh, Hayato Yamana

    10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014     272 - 276  2014  [Refereed]

     View Summary

    In this paper, we propose a new image annotation method by combining content-based image annotation and tag-based image annotation techniques. Content-based image annotation technique is adopted to extract "loosely defined concepts" by analyzing pre-given images' features such as color moment (CM), edge orientation histogram (EOH), and local binary pattern (LBP); followed by constructing a set of SVMs for 100 loosely defined concepts. A base-vector for each concept, similar to tag-based image annotation technique, is then constructed by using SVMs' predicted probabilistic results for sample-images whose main concepts are known. Finally cosine similarity between a query-image vector and the base vector is calculated for each concept. Experimental results show that our proposed method outperforms content-based image annotation technique by about 23% in accuracy.

    DOI

  • EA snippets: Generating summarized view of handwritten documents based on emphasis annotations

    Asai, H., Yamana, H.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8522 LNCS ( PART 2 ) 20 - 31  2014  [Refereed]

     View Summary

    Owing to the recent development of handwriting input devices such as tablets and digital pens, digital notebooks have become an alternative to traditional paper-based notebooks. Digital notebooks are available for various device types. To display a list of text documents on a device screen, we often use scaled thumbnails or text snippets summarized through natural language processing or structural analyses. However, these are ineffective in conveying summaries of handwritten documents, because informal and unstructured handwritten data are difficult to summarize using traditional methods. We therefore propose the use of emphasis-based snippets, i.e., summarized handwritten documents based on natural emphasis annotations, such as underlines and enclosures. Our proposed method places emphasized words into thumbnails or text snippets. User studies showed that the proposed method is effective for keyword-based navigation.

    DOI

  • A Challenge of Authorship Identification for Ten-thousand-scale Microblog Users

    Syunya Okuno, Hiroki Asai, Hayato Yamana

    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)     52 - 54  2014  [Refereed]

     View Summary

    Internet security issues require authorship identification for all kinds of internet contents; however, authorship identification for microblog users is much harder than other documents because microblog texts are too short. Moreover, when the number of candidates becomes large, i.e., big data, it will take long time to identify. Our proposed method solves these problems. The experimental results show that our method successfully identifies the authorship with 53.2% of precision out of 10,000 microblog users in the almost half execution time of previous method.

    DOI

  • Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites

    Fang, C., Noguchi, T., Yamana, H.

    Journal of Bioinformatics and Computational Biology   12 ( 5 ) 1440003  2014  [Refereed]

     View Summary

    Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.

    DOI PubMed

  • Simplified sequence-based method for ATP-binding prediction using contextual local evolutionary conservation

    Fang, C., Noguchi, T., Yamana, H.

    Algorithms for Molecular Biology   9 ( 1 ) 7  2014  [Refereed]

     View Summary

    Background: Identifying ligand-binding sites is a key step to annotate the protein functions and to find applications in drug design. Now, many sequence-based methods adopted various predicted results from other classifiers, such as predicted secondary structure, predicted solvent accessibility and predicted disorder probabilities, to combine with position-specific scoring matrix (PSSM) as input for binding sites prediction. These predicted features not only easily result in high-dimensional feature space, but also greatly increased the complexity of algorithms. Moreover, the performances of these predictors are also largely influenced by the other classifiers.Results: In order to verify that conservation is the most powerful attribute in identifying ligand-binding sites, and to show the importance of revising PSSM to match the detailed conservation pattern of functional site in prediction, we have analyzed the Adenosine-5'-triphosphate (ATP) ligand as an example, and proposed a simple method for ATP-binding sites prediction, named as CLCLpred (Contextual Local evolutionary Conservation-based method for Ligand-binding prediction). Our method employed no predicted results from other classifiers as input; all used features were extracted from PSSM only. We tested our method on 2 separate data sets. Experimental results showed that, comparing with other 9 existing methods on the same data sets, our method achieved the best performance.Conclusions: This study demonstrates that: 1) exploiting the signal from the detailed conservation pattern of residues will largely facilitate the prediction of protein functional sites; and 2) the local evolutionary conservation enables accurate prediction of ATP-binding sites directly from protein sequence.

    DOI PubMed

  • 編集にあたって

    Hayato,Yamana, Miyuki,Nakano, Yohei,Seki

    情報処理学会論文誌. データベース   6 ( 5 ) i - iii  2013.12

  • マルチコアCPU環境における低レイテンシデータストリーム処理

    上田高徳, 秋岡明香, 山名早人, 山名早人

    電子情報通信学会論文誌 D   J96-D ( 5 ) 1094 - 1104  2013.05

    J-GLOBAL

  • A Parallel Distributed Web Crawler Consisting of Producer-Consumer Modules

    Takanori Ueda, Koh Satoh, Daichi Suzuki, Kenji Uchida, Kousuke Morimoto, Sayaka Akioka, Hayato Yamana

    情報処理学会論文誌データベース(TOD)   6 ( 2 ) 85 - 97  2013.03

     View Summary

    Web crawlers must collect Web data while performing tasks such as detecting crawled URLs and preventing consecutive accesses to a particular Web server. Parallel-distributed crawling is carried out at a high speed for the enormous number of URLs existing on the Web. However, in order to crawl efficiently, a crawler must realize load balancing between computers in addition to reducing time and space complexities in the crawling process. The Web crawler proposed in this paper crawls the Web using producer-consumer modules, which compose the crawler, and it realizes load balancing per module and not per crawled Web site. In other words, it realizes load balancing that is appropriate to certain computer resources necessary for the modules that compose the Web crawler; in this way, it improves biases in computation loads and memory utilization between computers. Moreover, the crawler is able to crawl the Web on a large scale while conserving resources, because the modules that manage host names or URLs are implemented by data structures that are temporally and spatially efficient.

    CiNii J-GLOBAL

  • SCPSSMpred: A General Sequence-based Method for Ligand-binding Site Prediction

    Fang Chun, Noguchi Tamotsu, Yamana Hayato

    IMT   8 ( 3 ) 890 - 897  2013

     View Summary

    In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior.

    DOI CiNii

  • Detecting student frustration based on handwriting behavior

    Hiroki Asai, Hayato Yamana

    UIST 2013 Adjunct - Adjunct Publication of the 26th Annual ACM Symposium on User Interface Software and Technology     77 - 78  2013  [Refereed]

     View Summary

    Detecting states of frustration among students engaged in learning activities is critical to the success of teaching assistance tools. We examine the relationship between a student's pen activity and his/her state of frustration while solving handwritten problems. Based on a user study involving mathematics problems, we found that our detection method was able to detect student frustration with a precision of 87% and a recall of 90%. We also identified several particularly discriminative features, including writing stroke number, erased stroke number, pen activity time, and air stroke speed. © 2013 Authors.

    DOI

  • A Negative Sample Image Selection Method Referring to Semantic Hierarchical Structure for Image Annotation

    Shan-Bin Chan, Hayato Yamana, Shin'ichi Satoh

    2013 INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS (SITIS)     162 - 167  2013  [Refereed]

     View Summary

    When SVM is adopted for image annotation, we know that high quality sample images will improve image recognition accuracy. Images with the same visual/semantic features are adopted as positive sample images, and images with different visual/semantic features are adopted as negative sample images. But it is labor intensive in high quality sample images selection, especially when collecting by visual features. Most researchers randomly choose positive and negative sample images for classifier training. In many applications, adopting different negative sample image datasets will vary annotation accuracy. In this research, we will discuss the accuracy between different negative sample images dataset collected by semantic features. We adopted ImageNet as image dataset in this study, and we adopted WordNet for building semantic hierarchical tree. Semantic hierarchical structure tree is adopted to calculate the distance between each node. Then we adopt this distance relationship to prepare positive and negative sample images. We prepare one baseline method and suggest six different negative sample images selection methods for experiment. The binary SVM classifier training and prediction is implemented to compare the accuracy and Mean Reciprocal Rank (MRR) between baseline and each proposed method. Our results show that if we select uniform amount of negative sample images in each distance in the semantic hierarchical tree, we will achieve highest accuracy.

    DOI

  • WSD Team's Approaches for Textual Entailment Recognition at the NTCIR10 (RITE2).

    Daiki Ito, Masahiro Tanaka, Hayato Yamana

    Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-10, National Center of Sciences, Tokyo, Japan, June 18-21, 2013    2013  [Refereed]

  • IC-BIDE: Intensity Constraint-based Closed Sequential Pattern Mining for Coding Pattern Extraction

    Hiromasa Takei, Hayato Yamana

    2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA)     976 - 983  2013  [Refereed]

     View Summary

    We propose intensity constraint-based closed sequential pattern mining algorithm, called IC-BIDE, for a coding pattern extraction. Source code often contains frequent patterns of function calls or control flows, i.e., "coding patterns." Previous studies used sequential pattern mining to extract coding pattern; however, these algorithms have not been optimized for coding pattern extraction, which results in useless patterns as well as long execution times. We propose a new constraint, called "intensity constraint," in order to enhance closed sequential pattern mining and efficiently extract coding patterns. Our proposed algorithm is based on BI-Directional Execution (BIDE), an algorithm proposed expressly for closed sequential pattern mining. BIDE algorithm is not able to adapt to constraint-based closed sequential pattern mining. We extend BIDE algorithm and prove that our extended algorithm is able to adapt to intensity constraint-based closed sequential pattern mining. Our contributions are as follow; 1) We propose a new constraint, which we call "intensity"; 2) We propose intensity constraint-based closed sequential pattern mining algorithm, which we call "IC-BIDE" algorithm. Experimental results with open source software (Bullet Physics, MySQL, and OpenCV) show that IC-BIDE algorithm successfully excludes useless pattern effectively. Moreover, our proposed method is able to accelerate the extraction by a factor of 8.9 in comparison with the BIDE algorithm.

    DOI

  • MFSPSSMpred: Identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation

    Fang, C., Noguchi, T., Tominaga, D., Yamana, H.

    BMC Bioinformatics   14 ( 1 ) 300  2013  [Refereed]

     View Summary

    Background: Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.Results: Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004 similar to 0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045 similar to 0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.Conclusions: This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.

    DOI PubMed

  • 編集にあたって

    山名早人, 酒井哲也, 石川佳治

    情報処理学会論文誌. データベース   5 ( 2 ) i - iii  2012.06

    CiNii

  • 学生論文特集の発行にあたって(<特集>学生論文)

    山名 早人

    電子情報通信学会論文誌. D, 情報・システム   95 ( 3 ) 31 - 36  2012.03

    CiNii

  • Authorship Attribution Method using N-gram Part-of-Speech Tagger : Evaluation of Robustness in Topic-Independence

      10 ( 3 ) 7 - 12  2012.02

    CiNii J-GLOBAL

  • データストリーム処理におけるレイテンシ削減と高可用性のためのオペレータ実行方法

    上田高徳, 打田研二, 秋岡明香, 山名早人, 山名早人

    日本データベース学会論文誌   10 ( 3 )  2012

    J-GLOBAL

  • Hit count reliability: How much can we trust hit counts?

    Koh Satoh, Hayato Yamana

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7235 LNCS   751 - 758  2012  [Refereed]

     View Summary

    Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon
    however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts' reliability metrics to quantitatively evaluate hit counts' reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts - 99.8% precision, and skip to adopt unreliable hit counts - 74.3% precision. © 2012 Springer-Verlag Berlin Heidelberg.

    DOI

  • A survey of aggregated search for web search engines(<Special feature>Aggregated search)

    YAMANA Hayato

    The journal of Information Science and Technology Association   61 ( 9 ) 343 - 348  2011.09

     View Summary

    Nowadays, web search engines have features to blend other vertical search results into web search results. It is called aggregated search. Here, vertical search includes news, blog, image, video, real time information search such as twitter, and so on. Web search engines do not always blend these vertical search results. Instead, they decide when they should blend vertical search results, and which vertical search results should be blended depending on input queries. In this paper, we survey the techniques and evaluation schemes to blend vertical search results into web search results. Afte...

    DOI CiNii

  • Time-weighted web authoritative ranking

    Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    INFORMATION RETRIEVAL   14 ( 2 ) 133 - 157  2011.04

     View Summary

    We investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, respectively. An experimental dataset is created by crawling selected web pages for a period of several months. This data is used to compare page rankings by human users with rankings computed by the standard PageRank algorithm (which does not include temporal factors) and three algorithms that incorporate temporal factors, including the Time-Weighted PageRank (TWPR) algorithm introduced here. Analysis of the rankings shows that all three temporal-aware algorithms produce rankings more like those of human users than does the PageRank algorithm. Of these, the TWPR algorithm produces rankings most similar to human users', indicating that all three temporal factors are relevant in page ranking. In addition, analysis of parameter values used to weight the three temporal factors reveals that age factor has the most impact on page rankings, while trend and event factors have the second and the least impact. Proper weighting of the three factors in TWPR algorithm provides the best ranking results.

    DOI

  • An Eager Database Replication Middleware Managing Locks

    HORII Hiroshi, ONODERA Tamiya, YAMANA Hayato

    The IEICE transactions on information and systems (Japanese edetion)   94 ( 3 ) 515 - 524  2011.03

     View Summary

    既存データベースを利用してミドルウェアでデータベース複製を行う手法において,トランザクション中の更新ごとに複製を行う同期複製手法は,データベースの一貫性を損なうことなく,照会中心のアプリケーションの性能を向上することが可能である.しかし,従来手法は,レプリカをまたがったデッドロックを検出できない,繰返し可能読取りの分離性を提供できない問題がある.本論文では,排他制御をミドルウェア内で行うことで,繰返し可能読取りの分離性を保障し,かつ,デッドロックを検出可能とする同期複製ミドルウェア,Yamaを提案する.Yamaの排他制御は,SQL記述の解析とレプリカへの直接問合せで,SQLを処理する際に必要となるロック対象を特定し,Yama内の排他制御機構よりロックを獲得する.また,各レプリカに非コミット読込み分離性の処理を要求することで,レプリカ内部の排他制御で処理が停滞することを防ぐ.我々は,本手法をTPC-Wに適用したところ,従来手法に比べ,高いスループットを示すことが分かった.

    CiNii

  • Legible thumbnail: Summarizing on-line handwritten documents based on emphasized expressions

    Hiroki Asai, Takanori Ueda, Hayato Yamana

    Mobile HCI 2011 - 13th International Conference on Human-Computer Interaction with Mobile Devices and Services     551 - 556  2011  [Refereed]

     View Summary

    In recent years, digital notebooks have been replacing traditional paper-based notebooks with the development of handwriting input devices. Currently, we can access digital notebooks in various devices, including mobile devices. When we use such mobile devices, however, their limited screen size results in difficulty in understanding the summary of hand-written documents, without the use of a zoom feature. In this paper, we therefore propose the "Legible Thumbnail" that helps us to understand the summary without zooming. Our method detects the important words based on emphasis, such as an underline, and the method outputs the emphasized words to the thumbnail. Experiments show our thumbnail reduces search time by 21%. © 2011 Authors.

    DOI

  • Retweet Reputation: A Bias-Free Evaluation Method for Tweeted Contents.

    Shino Fujiki, Hiroya Yano, Takashi Fukuda, Hayato Yamana

    Social Innovation and Social Media, Papers from the 2011 ICWSM Workshop, Barcelona, Catalonia, Spain, July 21, 2011   WS-11-01   10 - 13  2011  [Refereed]

     View Summary

    The widespread of word of mouth using retweets on Twitter has enabled us to estimate trends in the real world. Previous research methods estimate the value of a tweeted content by calculating the number of subscribers who receive the tweet. However, we should consider the numbers of followers for both the tweeter and retweeter(s) as a greater number of followers may result in more retweets, which we call "bias." In this paper, we propose a bias-free evaluation method for tweeted contents. Experiments show that our method is successful at evaluating tweets without biases. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved.

  • ロック制御型同期複製ミドルウェアの提案

    堀井洋, 小野寺民也, 山名早

    信学論(D)   Vol.J94-D ( No.3 ) 515 - 524  2011

  • A survey of aggregated search for web search engines

    山名 早人

    The Journal of Information Science and Technology Association   61 ( 9 ) 343 - 348  2011

    DOI CiNii J-GLOBAL

  • Hit count dance: scientific basis to adopt search engines' hit counts

    舟橋 卓也, 山名 早人

    DBSJ journal   9 ( 1 ) 18 - 22  2010.06

    CiNii J-GLOBAL

  • A Method for Automatic Group Commit of OLTP Systems

    HORII Hiroshi, ONODERA Tamiya, YAMANA Hayato

    The IEICE transactions on information and systems (Japanese edetion)   93 ( 3 ) 222 - 231  2010.03

     View Summary

    データベースの幅広い普及により,更新トランザクションを多用するアプリケーションが増えている.更新トランザクションは,高価な高可用サーバで処理する必要があるため,スケールアップが求められる.そのためには,複数のトランザクションを一つのトランザクションとして処理するグループコミットが有効である.しかし,グループコミットを利用するには,アプリケーションの修正が必要で,既存のアプリケーションに適用することができなかった.本論文では,トランザクション処理内容の事前知識や,アプリケーションの修正を必要とせず,ミドルウェア内でグループコミットを行う手法を提案する.アプリケーションの修正を行わない場合,グループコミットの対象とするトランザクション集を特定し,バッチ更新をスケジュールする必要がある.本手法では,トランザクションの実行履歴をもとに,トランザクション中のSQLを事前に予測し,グループコミット対象,バッチ更新のスケジュールを行う.本手法をJavaで実装し,5クライアントの環境で評価したところ,データベースのCPU利用率を抑えながら,従来の約2倍のスループットを実現可能であることが分かった.

    CiNii J-GLOBAL

  • Nb-GCLOCK: A Non-blocking Buffer Management Based on the Generalized CLOCK

    Makoto Yui, Jun Miyazaki, Shunsuke Uemura, Hayato Yamana

    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010     745 - 756  2010  [Refereed]

     View Summary

    In this paper, we propose a non-blocking buffer management scheme based on a lock-free variant of the GCLOCK page replacement algorithm. Concurrent access to the buffer management module is a major factor that prevents database scalability to processors. Therefore, we propose a non-blocking scheme for bufferfix operations that fix buffer frames for requested pages without locks by combining Nb-GCLOCK and a non-blocking hash table. Our experimental results revealed that our scheme can obtain nearly linear scalability to processors up to 64 processors, although the existing locking-based schemes do not scale beyond 16 processors.

    DOI

  • 舟橋卓也, 山名早人

    Hit Count Dance, 検索エンジンのヒット数に対する信頼性検証

    日本データベース学会論文誌   Vol.9 ( No.1 ) 18 - 22  2010

  • Reliability Verification of Search Engines' Hit Counts: How to Select a Reliable Hit Count for a Query

    Takuya Funahashi, Hayato Yamana

    CURRENT TRENDS IN WEB ENGINEERING   6385s   114 - 125  2010

     View Summary

    In this paper, we investigate the trustworthiness of search engines' hit counts, numbers returned as search result counts. Since many studies adopt search engines' hit counts to estimate the popularity of input queries, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they change, when a user clicks the "Search" button more than once or clicks the "Next" button on the search results page, or when a user queries the same term on separate days. In this paper, we analyze the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. The results of our study show that the hit counts with the largest search offset just before search engines adjust their hit counts are the most reliable. Moreover, hit counts are the most reliable when they are consistent over approximately a week.

    DOI

  • A Lock-free GCLOCK Page Replacement Algorithm

    Makoto Yui, Jun Miyazaki, Shunsuke Uemura, Hirokazu Kato, Hayato Yamana

    情報処理学会論文誌. データベース   2 ( 4 ) 32 - 48  2009.12

     View Summary

    In this paper, we propose a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK. Concurrent access to the buffer management module is a major factor that prevents database scalability to processors. Therefore, we propose a non-blocking scheme for bufferfix operations that fix buffer frames for requested pages without locks by combining Nb-GCLOCK and a wait-free hash table. Our experimental results revealed that our scheme can obtain nearly linear scalability to processors up to 64 processors, although the existing locking-based schemes do not scale beyond 16 processors.

    CiNii

  • Improvement in Speed and Accuracy of Multiple Sequence Alignment Program PRIME

    Yamada Shinsuke, Gotoh Osamu, Yamana Hayato

    Information and Media Technologies   4 ( 2 ) 317 - 327  2009

     View Summary

    Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. We have developed an MSA program PRIME, whose crucial feature is the use of a group-to-group sequence alignment algorithm with a piecewise linear gap cost. We have shown that PRIME is one of the most accurate MSA programs currently available. However, PRIME is slower than other leading MSA programs. To improve computational performance, we newly incorporate anchoring and grouping heuristics into PRIME. An anchoring method is to locate well-conserved regions in a given MSA as anchor points to reduce the region of DP matrix to be examined, while a grouping method detects conserved subfamily alignments specified by phylogenetic tree in a given MSA to reduce the number of iterative refinement steps. The results of BAliBASE 3.0 and PREFAB 4 benchmark tests indicated that these heuristics contributed to reduction in the computational time of PRIME by more than 60% while the average alignment accuracy measures decreased by at most 2%. Additionally, we evaluated the effectiveness of iterative refinement algorithm based on maximal expected accuracy (MEA). Our experiments revealed that when many sequences are aligned, the MEA-based algorithm significantly improves alignment accuracy compared with the standard version of PRIME at the expense of a considerable increase in computation time.

    DOI CiNii

  • 8. Challenges to Gathering and Analyzing over 10 Billion of Web Pages(<Special Feature>Development of Advanced Development of Fundamental Software through Tight Collaboration of Academia and Industry)

    MURAOKA Yoichi, YAMANA Hayato, MATSUI Kunio, HASIMOTO Minako, AKABANE Kyoko, HAGIWARA Junichi

    Journal of Information Processing Society of Japan   49 ( 11 ) 1277 - 1283  2008.11

     View Summary

    Webページ数は,2006年11月時点で537億ページと推測されている.我々は,2004年1月〜2006年7月の間に,全世界の5,548万台のWebサーバからテキストのみを対象に収集を行い,ユニークなWebページ数として約144.5億ページを収集した.また,収集済みWebページに対して,トップレベルドメイン分布,記述言語分布,Webサーバの地理的位置の解析,バックリンク解析やPageRank計算を進め,Web空間の現状分析を行った.さらに,Webページの解析がビジネスに利用可能であることを示すために,企業のWebサイト上の活動を可視化するe企業調査プロトタイプを構築し,企業の特徴,戦略,評判などの抽出を行った.

    CiNii J-GLOBAL

  • 単独記事フィルタリングを用いた時系列ニュース記事分類法の提案

    中村智浩, 平野孝佳, 平手勇宇, 山名早人, 山名早人

    日本データベース学会論文誌   7 ( 2 )  2008

    J-GLOBAL

  • Web structure in 2005

    Yu Hirate, Shin Kato, Hayato Yamana

    ALGORITHMS AND MODELS FOR THE WEB-GRAPH   4936   36 - 46  2008  [Refereed]

     View Summary

    The estimated number of static web pages in Oct 2005 was over 20.3 billion, which was determined by multiplying the average number of pages per web server based on the results of three previous studies, 200 pages, by the estimated number of web servers on the Internet, 101.4 million. However, based on the analysis of 8.5 billion web pages that we crawled by Oct. 2005, we estimate the total number of web pages to be 53.7 billion. This is because the number of dynamic web pages has increased rapidly in recent years. We also analyzed the web structure using 3 billion of the 8.5 billion web pages that we have crawled. Our results indicate that the size of the "CORE," the central component of the bow tie structure, has increased in recent years, especially in the Chinese and Japanese web.

    DOI

  • Detecting article errors in english using search engines

    平野 孝佳, 平手 勇宇, 山名 早人

    DBSJ letters   6 ( 3 ) 1 - 4  2007.12

     View Summary

    The purpose of this study is to show the effectiveness of shell script execution on multi-core and/or SMT (Simultaneous Multi-Threading) processors. recently, multi-core processor and SMT technique have become popular even at home and in business. However, using programs or compilers without consideration of parallelism does not give us the benefits of multi-core and multi-thread. Programmers have to do parallel programming to receive the benefits. Therefore, automatic parallelizing technique has been studied actively. This paper proposes automatic parallelizing scheme for shell script prog...

    CiNii J-GLOBAL

  • EPCI: Extracting potentially copyright infringement texts from the web

    Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu Hirate, Hayato Yamana

    16th International World Wide Web Conference, WWW2007     1151 - 1152  2007  [Refereed]

     View Summary

    In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set of queries based on a given copyright reserved seed-text, (2) putting every query to search engine API, (3) gathering the search result Web pages from high ranking until the similarity between the given seed-text and the search result pages becomes less than a given threshold value, and (4) merging all the gathered pages, then re-ranking them in the order of their similarity. Our experimental result using 40 seed-texts shows that EPCI is able to extract 132 potentially copyright infringement Web pages per a given copyright reserved seed-text with 94% precision in average.

    DOI

  • Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    BMC BIOINFORMATICS   7   524  2006.12  [Refereed]

     View Summary

    Background: Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels.Results: We propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee.Conclusion: PRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at http://prime.cbrc.jp/.

    DOI PubMed

  • 1. Introduction to Search Engines(<Special Features>Search Engines 2005-Guides to the Web-)

    YAMANA Hayato, MURATA Tsuyoshi

    Journal of Information Processing Society of Japan   46 ( 9 ) 981 - 987  2005.09

    CiNii

  • TF^2P-growth : Frequent Itemset Mining Algorithm without Any Thresholds

    HIRATE YU, IWAHASHI EIGO, YAMANA HAYATO

    情報処理学会論文誌. データベース   46 ( 8 ) 60 - 71  2005.06

     View Summary

    Conventional frequent itemset mining algorithms require some user-specified minimum support, and then mine frequent itemsets with support values that are higher than the minimum support. As it is difficult to predict how many frequent itemsets will be mined with a specified minimum support, the Top-κ mining concept has been proposed. The Top-κ Mining concept is based on an algorithm for mining frequent itemsets without a minimum support, but with the number of most κ frequent itemsets ordered according to their support values. However, the Top-κ mining concept still requires a threshold κ. ...

    CiNii J-GLOBAL

  • The Branch Predictor Referring a BTB Entry Existence(Processor Architecture)

    SAITO FUMIKO, YAMANA HAYATO

    情報処理学会論文誌. コンピューティングシステム   45 ( 11 ) 71 - 79  2004.10

     View Summary

    The branch prediction is installed on the recent processor to avoid stalling pipeline. Branch prediction is a kind of speculative execution for control dependence. In the recent year, the deeper pipeline gets, the higher branch miss prediction penalty reaches. Thus, branch miss prediction rate must lower to rise processor performance. The branch prediction predicts a branch direction and a branch target address. BTB (Branch Target Buffer) registers Taken branch. We found that the most branches, which do not have BTB entry are NotTaken branches. We propose the branch predictor reffering a BT...

    CiNii J-GLOBAL

  • The Architecture of Search Engines (<Special feature> Internet Search Engines)

    YAMANA Hayato

    The journal of Information Science and Technology Association   54 ( 2 ) 84 - 89  2004.02

     View Summary

    Search engines are indispensable for using the Internet, today. However, their architecture is somewhat unknown. In this paper, the architecture of search engines is described by way of Google as an example, focusing on Web crawlers, the indexing and the searching scheme. Moreover, the problems to manage many queries and the cost of running the search engines is taken up.

    DOI CiNii J-GLOBAL

  • Exploitation of Informational Applications : Toward the Global Web Information Archive(<Special Edition>Exploitation of Internet New Applications)

    Yamana Hayato

    The journal of the Institute of Image Information and Television Engineers   57 ( 12 ) 1632 - 1637  2003.12

    DOI CiNii J-GLOBAL

  • (IT in the Home)The Internet Leading IT Society : The Present Internet Access at Home and the Future(Special Issue on IT in Daily Life: IT Application Systems at Our Fingertips)

    YAMANA Hayato

    The Journal of the Institute of Electronics, Information, and Communication Engineers   86 ( 5 ) 304 - 310  2003.05

     View Summary

    総務省の調査によれば,日本における家庭へのインターネット普及率は,2002年末に全世帯の8割を超えた.全世帯へのインターネット普及率が10%を超えたのは1998年末であり,普及率10%までの所要年数は5年である.この所要年数をパソコンの13年,自動車・携帯電話の15年等に比較すると,いかに急ピッチで普及してきたかが分かる.本稿では,インターネットと家庭との関係に焦点を当て,具体的データに基づいて,家庭からのインターネットアクセスの過去・現在・未来を紹介する.

    CiNii J-GLOBAL

  • Design of Automatic Parallelizing Intermediate Code Interpreter

    KOIKE HANPEI, YAMANA HAYATO, YAMAGUCHI YOSHINORI

    情報処理学会論文誌. プログラミング   40 ( 10 ) 64 - 74  1999.12

     View Summary

    In this paper, the design of the intermediate code interpreter, which executes a sequential program in parallel using speculative method, is discussed. Software techniques which enable an efficient parallel speculative execution without hardware support, such as the check point execution mechanism with which an appropriate parallel execution granularity is established, and the efficient implementation of the speculative memory operations which minimize the overhead of searching, recording and the mutual exclusion, are proposed. Experiment results to see the basic performance of these techni...

    CiNii J-GLOBAL

  • Evaluation of Communication Mechanisms for Distributed Memory Parallel Computers in Wavefront Computation (Special Issue on Parallel Processings)

    SAKANE HIROFUMI, KODAMA YUETSU, TATEBE OSAMU, KOIKE HANPEI, YAMANE HAYATO, YAMAGUCHI YOSHINORI, YUBA TOSHITSUGU

    Transactions of Information Processing Society of Japan   40 ( 5 ) 2281 - 2292  1999.05

     View Summary

    In this paper, we discuss efficient parallel execution of a dense-matrix problem considering trade-offs between fine-grain and coarse-grain communication in distributed memory machines. The solution of the triangular system of equations involves data dependencies between consecutive iterations in the outer-loop. The dependencies can be naturally solved and processed in parallel by wavefront computation. Two ways of parallelizing are presented; the element-wise fine-grain approach and the coarse-grain approach. We implemented these algorithms on both EM-X and AP 1000+. Fine-grain support mec...

    CiNii J-GLOBAL

  • 「情報検索の新たな展開 : テストコレクションからサーチエンジンまで」

    細野 公男, 小川 泰嗣, 神門 典子, 木谷 勉, 住田 一男, 福島 俊一, 山名 早人

    情報処理   40 ( 2 ) 34 - 35  1999.02

    CiNii

  • Experiments of Collecting WWW Information Using Distributed WWW Robots

    Hayato Yamana, Kent Tamura, Hiroyuki Kawano, Satoshi Kamei, Masanori Harada, Hideki Nishimura, Isao Asai, Hiroyuki Kusumoto, Yoichi Shinoda

    SIGIR Forum (ACM Special Interest Group on Information Retrieval)     379 - 380  1998  [Refereed]

     View Summary

    Abstract This paper presents the experiments of col-lecting the documents on the WWW using distributed WWW robots. We propose distributed WWW robots to collect the documents quickly. Our final goal is to col-lect all of the documents on the WWW in Japan within one day. Currently, eight distributed WWW Robots are running in Japan. The experimental results show that we are able to gain 5.8 to 9.7 times speedup when four distributed WWW robots are placed at different places in comparison with when only one WWW robot is used. 1

    DOI

  • 次世代並列処理計算機のプロトタイプを開発 : 通信オーバヘッドを大幅に削減

    山名 早人

    電子情報通信学会誌   79 ( 8 ) 855 - 855  1996.08

    CiNii

  • MOSトランジスタ構造の高安定真空マイクロ素子の開発に成功 : ディスプレイなどへの実用化に大きく前進

    山名 早人

    電子情報通信学会誌   79 ( 6 ) 630 - 630  1996.06

    CiNii

  • Identifying the capability of overlapping computation with communication

    A Sohn, J Ku, Y Kodama, M Sato, H Sakane, H Yamana, S Sakai, Y Yamaguchi

    PROCEEDINGS OF THE 1996 CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT '96)     133 - 138  1996  [Refereed]

     View Summary

    Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. Identifying the capability of overlapping, machine architects and programmers will be able to provide tools which can effectively utilize the underlying architecture. This report explicates the overlapping capability of two distributed-memory multiprocessors: the laboratory prototype EM-X multithreaded multiprocessor and a commercially available IBM SP2 with wide nodes. The well-known bitonic sorting algorithm is selected for experiments. Various message sizes are used to determine when, where, how much and why overlapping takes place. Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. EM-X is found message-size insensitive yielding high overlap for various message sizes while SP2 was effective for a window of message size of 512 to 2K integers, depen...

  • A Distributed Control Scheme of Macrotask-level Speculative Execution on the EM-4 Multiprocessor

    Yamana,Hayato, Sato,Mitsuhisa, Kodama,Yuetsu, Sakane,Hirohumi, Sakai,Shuichi, Yamaguchi,Yoshinori

    IPSJ Journal   36 ( 7 ) 1578 - 1588  1995.07  [Refereed]

     View Summary

    The purpose of this paper is to propose a new fast control scheme of macrotasks with speculation. A macrotask is a coarse grain task which is a unit of speculation. Previous works have reported that the speedup ratio is 12 to 630 times in comparison with conventional execution schems without speculation when both the speculation depth and the computational resource are infinite, that is called oracle model. The control overhead,however, prevents the speedup from attaining the theoretical speedup ratio. Thus, the control scheme with small overhead is desired. The distributed control scheme archieves small control overhead by adopting two mechanisms-1) Each macrotask creates its successive macrotasks and 2) Each macrotask snoops the broadcasted control signals and determines its next state by itself. Thus, the control of macrotasks can be parallelized and the overhead to control macrotasks does not depend on the number of macrotasks. The scheme has been implemented on the EM-4 multiprocessor. Preliminary evaluations using Boolean Recurrence loops show that the control overhead of the proposed scheme is smaller than that of conventional control schemes-centralized control schemes.

    J-GLOBAL

  • A Computation Scheme of the Execution Time of Single Doacross Loops on Distributed Shared Memory Machines

    Yamana Hayato, Yasue Toshiaki, Muraoka Yoichi, Yamaguchi Yoshinori

    The transactions of the Institute of Electronics, Information and Communication Engineers   78 ( 2 ) 170 - 178  1995.02

     View Summary

    近年の分散共有メモリ型並列計算機は,プロセッサの処理と独立に非同期でデータ転送を行う機構を備え,プロセッサがデータ転送に忙殺されるのを防いでいる.また,プロセッサの高速化と64bitアーキテクチャ化により,演算性能は向上したが,コストパフォーマンスの関係からネットワークの転送性能は,プロセッサの演算性能より低く設定される場合が多い.本論文では,このような分散共有メモリ型並列計算機において,1重Doacross型ループの実行時間算出法を提案する.実行時間算出にあたっては,ネットワークの転送性能として,プロセッサとネットワーク間のデータ入出力時間間隔であるデータ転送ピッチを新たに導入すると共に,データ転送余裕時間およびデータ転送発行遅延時間を定義し,ループ実行時間を求める.また,求められたループ実行時間の利用例としてデータ転送ピッチを考慮した場合,データ転送順序の変更によって実行時間を小さくすることができることを示す.T3Dを対象としたシミュレーションの結果,従来法に比較して,より実測値に近い実行時間を算出できることを確認した.

    CiNii J-GLOBAL

  • Dynamic characteristics of multithreaded execution in the EM-X multiprocessor

    H Sakane, R Sato, Y Kodama, H Yamana, S Sakai, Y Yamaguchi

    1995 INTERNATIONAL WORKSHOP ON COMPUTER PERFORMANCE MEASUREMENT AND ANALYSIS (PERMEAN '95), PROCEEDINGS     14 - 22  1995  [Refereed]

     View Summary

    Multithreading is known be effective for tolerating communication latency in distributed-memory multiprocessors. Two types of support for multithreading have been used to date including software and hardware. This paper presents the impact of multithreading on performance through empirical studies. In particular, we explicate the performance difference between software support and hardware support for the 80-processor EM-X distributed-memory multiprocessor which we have designed and implemented. The EMX provides three types of hardware supports for fine-grain multithreading including direct remote memory access, fast thread invocation, and dedicated instructions for generating fixed-sized communication packets. To demonstrate the effect of multithreading, we have performed various experiments using micro benchmark programs and MP3D, one of the SPLASH benchmarks. Three types of performance parameters have been measured including processor efficiency, remote memory latency, and network load. Experimental results indicate that the EM-X architecture is highly effective for supporting the multithreading principles of execution through dedicated hardware and software. keywords Multithreading, latency hiding, fine grain communication, direct remote memory access, shared memory benchmark, synthetic workload. 1

  • A Speculative Execution Scheme of Macrotasks for Parallel Processing System

    Yamana Hayato, Yasue Toshiaki, Ishii Yoshihiko, Muraoka Yoichi

    The transactions of the Institute of Electronics, Information and Communication Engineers   77 ( 5 ) 343 - 353  1994.05

     View Summary

    本論文では,並列処理システム上ではFORTRANプログラムを高速に実行する方式として,多段の条件分岐に渡る先行評価を用いたプログラムの並列化と実行方式を提案する.従来,条件分岐を含むプログラムを並列化する手法がいくつか提案されている.先行評価を用いない手法としては,(1)タスクの最早実行条件求出法があり,先行評価を用いる手法としては,(2)スーパスカラプロセッサやVLIW計算機を対象とした条件分岐1段に限った先行評価方式,および,(3)特定のループを対象とした多段の先行評価方式,が提案されている.しかし,(1)最早実行条件を求めるのみでは十分な並列性が得られない.(2)1段の条件分岐の先行評価で得られる速度向上はたかだか2倍である,(3)適用対象が特定ループに限られる,という問題をもつ.これらの問題に対して,本論文では,プログラムをマクロタスクに分割し,マクロタスク間の多段の先行評価方式を一般的な並列処理システム上で定義する.そして,各々のマクロタスクについと,実行開始条件・制御確定条件・実行停止条件を用いたマクロタスクの実行制御手法を提案する.

    CiNii J-GLOBAL

  • A FORTRAN compiling method for dataflow machines and its prototype compiler for the parallel processing system -Harray-

    T. Yasue, H. Yamana, Y. Muraoka

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   757 LNCS   482 - 496  1993

     View Summary

    In this paper, we propose an efficient techniques, called CD translation, to compile a FORTRAN program to a optimized dataflow code. The CD translation generates the dataflow control information from a control flow graph by using the data flow analysis with the branch node operation, and enables to analyze a sequential program with any type of the control structures (e.g. goto statement and irreducible loop) correctly, while the previous method cannot compile a FORTRAN to the dataflow program perfectly. This analysis technique is much worthy of not only the construction of the compiler for dataflow machines but also the analysis technique for the parallelizing compiler because the dataflow program represents all the program dependencies with only data dependencies unifyingly and enables to analyze all the program dependencies identically. Moreover the FORTRAN compiler implementing the CD translation is introduced.

    DOI

  • A Flow-Executing Scheme of Doacross Loops on Dynamic Dataflow Machines

    ISHII Yoshihiko, YASUE Toshiaki, YAMANA Hayato, MURAOKA Yoichi

    The Transactions of the Institute of Electronics,Information and Communication Engineers.   75 ( 7 ) 440 - 449  1992.07

    CiNii J-GLOBAL

  • An environment for dataflow program development of parallel processing system-Harray-.

    山名早人, 神舘淳, 安江俊明, 村岡洋一

    電子情報通信学会論文誌 D-1   73 ( 6 )  1990

    J-GLOBAL

  • System architecture of parallel processing system - Harray

    Hayato Yamana, Toshikazu Marushima, Takashi Hagiwara, Yoichi Muraoka

    Proceedings of the International Conference on Supercomputing   Part F130184   76 - 89  1988.06  [Refereed]

     View Summary

    This paper proposes a parallel processing system - Harray-for scientific computations. Data flow computers are expected to obtain the high performance because they can extract parallelism fully from a program. However, they have many problems, such as the difficulty of controlling the sequence of execution. The - Harray - system is an array processor which adapts two levels of control mechanism
    data flow execution in each processor and control flow between processors, in order to take full advantage of both mechanisms. A task which is assigned to a processor is called a "macro-block". Three types of macro-blocking and three types of activation schemes for the macro-block which initiates its execution are introduced in order to attain the high performance. Moreover, a hardware synchronization mechanism is used to reduce synchronization overhead and to gain the liner speedup of the - Harray - system. In this paper, the system architecture of the - Harray - system and its performance evaluation by software simulation are presented.

    DOI

▼display all

Books and Other Publications

▼display all

Misc

  • A Survey of Explainable Recommender System

    松島ひろむ, 森澤竣, 石山琢己, 山名早人

    情報科学技術フォーラム講演論文集   20th  2021

    J-GLOBAL

  • Predicting Answers with Hints from Online Answer Data-Targeting Geometric Problems-

    三浦将人, 村上統馬, 中山祐貴, 山名早人

    電子情報通信学会技術研究報告   119 ( 393(ET2019 69-75)(Web) )  2020

    J-GLOBAL

  • A Survey of the Hardware Implementations for Improving Performance of Fully Homomorphic Encryption

    井上紘太朗, 鈴木拓也, 山名早人

    情報科学技術フォーラム講演論文集   19th  2020

    J-GLOBAL

  • The Use of survey materials for the repair landscape project in the "Higashiyama-Higashi" Important Preservation District for Groups of Traditional Buildings, Kanazawa Part 3

    UCHIDA Shin, HIRATE Yu, YAMANA Hayato

    National Institute of Technology,Ishikawa College Bulletin   51   19 - 24  2019

     View Summary

    The aim of this study is to utilization of survey materials in repair landscape project, exemplifying "Higashiyama-higashi" Important Preservation District for Groups of Historic Buildings located in Ishikawa Prefecture as a case study. In this paper, we investigated the cleaning method of timber lattice and the sectional structure of wooden fittings and analyzed the relationship between the change in appearance of timber lattice and the sectional structure of wooden fittings.

    DOI CiNii

  • メニーコアCPU環境における準同型暗号演算高速化を目的とするタスクスケジューリング手法の検討

    鈴木拓也, 石巻優, 山名早人

    情報処理学会研究報告(Web)   2019 ( HPC-170 )  2019

    J-GLOBAL

  • Secureな環境における副作用ファジー検索システムの構築

    菅野敦之, 野口保, 山名早人

    日本薬剤師会学術大会(Web)   52nd  2019

    J-GLOBAL

  • FCMalloc:完全準同型暗号の高速化に向たメモリアロケータ

    馬屋原昂, 佐藤宏樹, 石巻優, 今林広樹, 山名早人

    情報処理学会研究報告(Web)   2017 ( OS-141 )  2017

    J-GLOBAL

  • 特定分野における単語重要度計算手法の提案と短い文章における著者の専門性推定への適応

    滝川真弘, 山名早人

    情報処理学会研究報告(Web)   2017 ( NL-233 )  2017

    J-GLOBAL

  • CTR向上を目的としたWEBページ上でのオンライン広告配置位置推定

    大谷一善, 滝川真弘, 堀田弘明, 山名早人

    情報科学技術フォーラム講演論文集   16th  2017

    J-GLOBAL

  • 電子ペンを利用した数学手書き答案の戦略分類手法~多項式展開問題を題材として~

    浅井洋樹, 山名早人, 山名早人

    情報処理学会研究報告(Web)   2016 ( CE-133 )  2016

    J-GLOBAL

  • 電子ペンを用いた手書き解答データによる幾何学解答パターン分類手法

    森山優姫菜, 下岡純也, 浅井洋樹, 山名早人, 山名早人

    情報科学技術フォーラム講演論文集   15th  2016

    J-GLOBAL

  • 完全準同型暗号のデータマイニングへの利用に関する研究動向

    佐藤宏樹, 馬屋原昂, 石巻優, 今林広樹, 山名早人

    情報科学技術フォーラム講演論文集   15th  2016

    J-GLOBAL

  • 特定分野を対象とした単語重要度計算手法の提案とTwitterにおける専門性推定への適応

    滝川真弘, 山名早人

    情報科学技術フォーラム講演論文集   15th  2016

    J-GLOBAL

  • A study of effective visit history utilization for Location recommendation-User’s Familiarity with area and Visit pattern change-

    HAN Jungkyu, 山名早人

    電子情報通信学会技術研究報告   115 ( 110(DE2015 1-11) )  2015

    J-GLOBAL

  • スマートウォッチにおけるアイズフリー日本語入力手法

    下岡純也, 浅井洋樹, 山名早人, 山名早人

    日本ソフトウェア科学会研究会資料シリーズ(Web)   ( 76 )  2015

    J-GLOBAL

  • Improved Native Language Identification with Upper Phrase Information and Training Data Selection

    TANAKA MASAHIRO, WANG LAN, YAMANA HAYATO

    IEICE technical report. Natural language understanding and models of communication   114 ( 366 ) 127 - 132  2014.12

     View Summary

    Native Language Identification, the task of identifying the native language (L1) of a writer based solely on a sample of his/her writing in non-native language (L2), is one of the authorship attribution problem. In this paper, we propose i) "upper phrase information" as a new feature, ii) discarding essay data which seem to be outliers from the training dataset. NLI is able to applicable to many other NLP tasks such as Second Language Acquisition. From 2005, many researchers have approached this task in different ways. Combining all the proposed techniques and existing methods, our system archives 85.6% accuracy on the NLI Shared Task 2014 data. To the best of our knowledge, this is a state-of-the-art accuracy in the NLI tasks.

    CiNii J-GLOBAL

  • オンライン手書き情報を用いた未定着記憶推定システム

    浅井洋樹, 山名早人

    研究報告コンピュータと教育(CE)   2014 ( 1 ) 1 - 6  2014.11

     View Summary

    漢字や英単語を記憶する暗記学習は,忘却せずに再生可能となるよう記憶を定着させることが目標であり,より効率的に記憶を定着可能な学習システムは学習者にとって有用である.記憶を定着させるためには,暗記する対象を再学習する反復学習を繰り返す必要があると言われており,効率的に暗記を行うためには定着していない記憶を選び出して優先的に反復学習を行うことが必要である.しかし,学習者の正解・不正解のテスト結果だけでは,正解しているがすぐに忘れてしまう定着度の低い暗記対象が検出できないため,未定着記憶を網羅することができない.また定着・未定着の 2 値判定にとどまり,反復学習の優先順位を決めることができない.そこで本研究では,タブレット端末等から取得可能な時系列情報や筆圧が含まれるオンライン手書き情報を用いて,学習者の記憶定着度を推定する手法を提案する.提案システムによって得られる連続値である 「記憶度」 の数値が低い事象を優先的に学習することで,効率的に暗記可能な学習支援システムの構築が実現可能となる.

    CiNii J-GLOBAL

  • マイクロブログを対象とした著者推定手法の提案 : 10,000人レベルでの著者推定 (データ工学)

    奥野 峻弥, 浅井 洋樹, 山名 早人

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   114 ( 173 ) 65 - 70  2014.08

     View Summary

    従来,著者推定研究は小説に対する著者推定を中心に研究が行われており,推定対象を限定した,少人数に対する著者候補者群が取り扱われてきた.これに対し,我々はマイクロブログを対象にした,不特定多数の候補者群に対する著者推定の提案を行った.その際,精度向上のためマイクロブログ特有の叫喚フレーズに対する正規化手法,および計算量削減のため推定に必要となるメッセージ数を削減する手法を提案してきた.本稿では,より多くのマイクロブログ利用者を対象にした著者推定を行う上での問題点,特に学習用データとテストデータの取得期間の差異が精度に与える影響について検証し,学習用データの取得期間が精度に与える影響を小さくする手法を提案する.実験ではTwitterユーザ10,000人に対して著者推定を行い,Precision@1で0.535,MRRで0.602を達成した.

    CiNii J-GLOBAL

  • メンション情報を利用したTwitterユーザプロフィール推定における単語重要度算出手法の考察 (データ工学)

    上里 和也, 田中 正浩, 浅井 洋樹, 山名 早人

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   114 ( 173 ) 125 - 130  2014.08

     View Summary

    Twitterのような大規模なソーシャルサービスにおいて,ユーザの興味や所属などのプロフィールを知ることは,効果的なマーケティングを行う上で重要である.このような背景から,Twitterにおけるプロフィール推定に関する研究が行われてきた.従来のプロフィール推定手法では,フォロー情報によって構築されるソーシャルグラフからコミュニティを抽出し,対象のユーザが属するコミュニティの属性を推定することでプロフィール推定を行なっている.しかし,各々のフォローの目的や,活発な交流があるかという点を考慮することができないため,実際に親密な関係を持つユーザ群をコミュニティとして抽出することが困難であるという問題が存在する.それに対して奥谷らは,フォローに代えてメンション情報を用いてソーシャルグラフを構築することで,これらの問題を解決する手法を提案している.しかし同手法には,プロフィール推定の対象となるユーザの周辺ユーザのプロフィールに幅広く共通して出現する単語が,プロフィールとして出力されにくいという問題がある.そこで本論文では,奥谷らのプロフィール推定手法における単語の重要度の算出方法を変更し,Twitterユーザ全体からランダムにサンプリングした100,000ユーザのデータを利用して一般語をフィルタリングすることで,この問題を解決する手法を提案する.6人の被験者による実験の結果,奥谷らの手法と比較して,Precision@10が0.37から0.78,MRRが1.44から2.61に向上した.

    CiNii J-GLOBAL

  • Topics and Influential User Identification in Twitter using Twitter Lists

    Zhou Guanying, Asai Hiroki, Yamana Hayato

    IEICE technical report. Data engineering   114 ( 173 ) 71 - 76  2014.08

     View Summary

    Twitter, as one of the most popular social network services, draws the attention of more and more researchers worldwide. With a large amount of information tweeted every day, it turns essential to identify the influential users we are interested in. In the previous research, researchers mainly identify topics from tweets and rank users by utilizing the follow relationship; however, the following relationship is strongly related to their reputation in real world and cannot describe their influence and activity level in Twitter exactly. Instead, in this paper, to identify topics and influential users, we use "Twitter List," whose name represents the topic of listed members. By analyzing Twitter List, we are able to detect topics and identify influential users in the corresponding topic more efficiently. Based on our experimental evaluation using the selected two topics, the influential users identified by our proposed method have the average influence score related to the topic made by interviewees of 3.7 and 3.33 outweigh the methods of ranking by follower numbers with the average score of 3.22 and 3.27 respectively.

    CiNii

  • Cross-Domain Investigations of User Evaluations under the Multi-cultural Backgrounds

    LE JIAWEN, YAMANA HAYATO

    IEICE technical report. Data engineering   114 ( 173 ) 137 - 142  2014.08

     View Summary

    Twitter, as one of the most popular social network services, is widely used to query public opinions. In this research, a large corpus of Twitter data, along with online reviews, are used to apply sentimental and culture-based analysis, so as to figure out the cultural effects on user evaluations. Posts written in more than 30 languages from more than 30 countries are collected. In order to implement the cross-domain investigations, global restaurants and world attractions are taken as the research subjects, and a series of classifiers with high performances are trained and applied in the experiment steps. Then various analyzing methods are applied to obtain informative results and conclusions about the user evaluations for the targets. As the contributions, this research validates the capability and field transferability of the proposed methods for cross-lingual sentiment analysis, and arrives at the conclusions that the cultural effects on user evaluations for both restaurant domain and travel domain actually exist, and are obvious for some countries and cultural backgrounds.

    CiNii

  • 単語の意味概念行列を用いたキーワード生成による関連論文検索システム (データ工学)

    林 佑磨, 奥野 峻弥, 山名 早人

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   114 ( 173 ) 53 - 58  2014.08

     View Summary

    研究者は,研究意義や既存手法を知るために,自らの研究分野に関連する論文の調査を行う.論文の調査に広く用いられる論文検索システムは,ユーザがキーワードをクエリとして与えるキーワード検索が一般的である.専門用語の多い技術分野などでは,特に研究分野にまだ精通していない研究者が,適切なキーワードを与えて検索を行い,満足な結果を得ることは難しい.この問題を解決するため,我々は論文の概要を入力とする関連論文検索システムを提案した.同システムでは,入力の概要に含まれる単語が持つ意味を意味概念行列として表現し考慮することで,検索に用いるクエリの自動生成を行っている.本稿では,我々が以前提案したシステムの拡張を行う.具体的には,1)日本語論文検索への対応,および2)RSMによる論文クラスタリングを用いてより質の高いキーワード生成を実現する.日本語に対応している既存の論文検索システムとの比較により,p@10を平均で0.17向上させることに成功した.

    CiNii J-GLOBAL

  • Topics and Influential User Identification in Twitter using Twitter Lists

    Guanying Zhou, Hiroki Asai, Hayato Yamana

    IPSJ SIG Notes   2014 ( 13 ) 1 - 6  2014.07

     View Summary

    Twitter, as one of the most popular social network services, draws the attention of more and more researchers worldwide. With a large amount of information tweeted every day, it turns essential to identify the influential users we are interested in. In the previous research, researchers mainly identify topics from tweets and rank users by utilizing the follow relationship; however, the following relationship is strongly related to their reputation in real world and cannot describe their influence and activity level in Twitter exactly. Instead, in this paper, to identify topics and influential users, we use "Twitter List," whose name represents the topic of listed members. By analyzing Twitter List, we are able to detect topics and identify influential users in the corresponding topic more efficiently. Based on our experimental evaluation using the selected two topics, the influential users identified by our proposed method have the average influence score related to the topic made by interviewees of 3.7 and 3.33 outweigh the methods of ranking by follower numbers with the average score of 3.22 and 3.27 respectively.

    CiNii

  • マイクロブログを対象とした著者推定手法の提案-10,000人レベルでの著者推定-

    奥野峻弥, 浅井洋樹, 山名早人

    研究報告情報基礎とアクセス技術(IFAT)   2014 ( 12 ) 1 - 6  2014.07

     View Summary

    従来,著者推定研究は小説に対する著者推定を中心に研究が行われており,推定対象を限定した,少人数に対する著者候補者群が取り扱われてきた.これに対し,我々はマイクロブログを対象にした,不特定多数の候補者群に対する著者推定の提案を行った.その際,精度向上のためマイクロブログ特有の叫喚フレーズに対する正規化手法,および計算量削減のため推定に必要となるメッセージ数を削減する手法を提案してきた.本稿では,より多くのマイクロブログ利用者を対象にした著者推定を行う上での問題点,特に学習用データとテストデータの取得期間の差異が精度に与える影響について検証し,学習用データの取得期間が精度に与える影響を小さくする手法を提案する.実験では Twitter ユーザ 10,000 人に対して著者推定を行い,Precision@1 で 0.535,MRR で 0.602 を達成した.

    CiNii

  • メンション情報を利用したTwitterユーザプロフィール推定における単語重要度算出手法の考察

    上里和也, 田中正浩, 浅井洋樹, 山名早人

    研究報告情報基礎とアクセス技術(IFAT)   2014 ( 22 ) 1 - 6  2014.07

     View Summary

    Twitter のような大規模なソーシャルサービスにおいて,ユーザの興味や所属などのプロフィールを知ることは,効果的なマーケティングを行う上で重要である.このような背景から,Twitter におけるプロフィール推定に関する研究が行われてきた.従来のプロフィール推定手法では,フォロー情報によって構築されるソーシャルグラフからコミュニティを抽出し,対象のユーザが属するコミュニティの属性を推定することでプロフィール推定を行なっている.しかし,各々のフォローの目的や,活発な交流があるかという点を考慮することができないため,実際に親密な関係を持つユーザ群をコミュニティとして抽出することが困難であるという問題が存在する.それに対して奥谷らは,フォローに代えてメンション情報を用いてソーシャルグラフを構築することで,これらの問題を解決する手法を提案している.しかし同手法には,プロフィール推定の対象となるユーザの周辺ユーザのプロフィールに幅広く共通して出現する単語が,プロフィールとして出力されにくいという問題がある.そこで本論文では,奥谷らのプロフィール推定手法における単語の重要度の算出方法を変更し,Twitter ユーザ全体からランダムにサンプリングした 100,000 ユーザのデータを利用して一般語をフィルタリングすることで,この問題を解決する手法を提案する.6 人の被験者による実験の結果,奥谷らの手法と比較して,Precision@10 が 0.37 から 0.78,MRR が 1.44 から 2.61 に向上した.

    CiNii

  • メンション情報を利用したTwitterユーザプロフィール推定における単語重要度算出手法の考察

    上里和也, 田中正浩, 浅井洋樹, 山名早人

    研究報告データベースシステム(DBS)   2014 ( 22 ) 1 - 6  2014.07

     View Summary

    Twitter のような大規模なソーシャルサービスにおいて,ユーザの興味や所属などのプロフィールを知ることは,効果的なマーケティングを行う上で重要である.このような背景から,Twitter におけるプロフィール推定に関する研究が行われてきた.従来のプロフィール推定手法では,フォロー情報によって構築されるソーシャルグラフからコミュニティを抽出し,対象のユーザが属するコミュニティの属性を推定することでプロフィール推定を行なっている.しかし,各々のフォローの目的や,活発な交流があるかという点を考慮することができないため,実際に親密な関係を持つユーザ群をコミュニティとして抽出することが困難であるという問題が存在する.それに対して奥谷らは,フォローに代えてメンション情報を用いてソーシャルグラフを構築することで,これらの問題を解決する手法を提案している.しかし同手法には,プロフィール推定の対象となるユーザの周辺ユーザのプロフィールに幅広く共通して出現する単語が,プロフィールとして出力されにくいという問題がある.そこで本論文では,奥谷らのプロフィール推定手法における単語の重要度の算出方法を変更し,Twitter ユーザ全体からランダムにサンプリングした 100,000 ユーザのデータを利用して一般語をフィルタリングすることで,この問題を解決する手法を提案する.6 人の被験者による実験の結果,奥谷らの手法と比較して,Precision@10 が 0.37 から 0.78,MRR が 1.44 から 2.61 に向上した.

    CiNii

  • Topics and Influential User Identification in Twitter using Twitter Lists

    Guanying Zhou, Hiroki Asai, Hayato Yamana

    IPSJ SIG Notes   2014 ( 13 ) 1 - 6  2014.07

     View Summary

    Twitter, as one of the most popular social network services, draws the attention of more and more researchers worldwide. With a large amount of information tweeted every day, it turns essential to identify the influential users we are interested in. In the previous research, researchers mainly identify topics from tweets and rank users by utilizing the follow relationship; however, the following relationship is strongly related to their reputation in real world and cannot describe their influence and activity level in Twitter exactly. Instead, in this paper, to identify topics and influential users, we use "Twitter List," whose name represents the topic of listed members. By analyzing Twitter List, we are able to detect topics and identify influential users in the corresponding topic more efficiently. Based on our experimental evaluation using the selected two topics, the influential users identified by our proposed method have the average influence score related to the topic made by interviewees of 3.7 and 3.33 outweigh the methods of ranking by follower numbers with the average score of 3.22 and 3.27 respectively.

    CiNii

  • 包括的遺伝子ネットワーク構造からの活性化部位推定手法の開発

    下岡純也, 油谷幸代, 山名早人, 山名早人

    日本分子生物学会年会プログラム・要旨集(Web)   37th  2014

    J-GLOBAL

  • 編集にあたって

    山名 早人, 中野 美由紀, 関 洋平

    情報処理学会論文誌データベース(TOD)   6 ( 5 ) i - iii  2013.12

    CiNii

  • 教育環境における書き込み可能な電子ペーパー端末の利活用

    浅井 洋樹, 山名 早人

    MNC Communications   15 ( 15 )  2013.12

    CiNii

  • 文体及びツイート付随情報を用いた乗っ取りツイート検出

    上里和也, 奥谷貴志, 浅井洋樹, 奥野峻弥, 田中正浩, 山名早人

    情報処理学会研究報告. データベース・システム研究会報告   2013 ( 21 ) 1 - 8  2013.11

     View Summary

    Twitter のユーザ数が増加を続ける一方で,不正に ID 及びパスワードを入手され,他人によってツイートを投稿される被害が増加している.これに対し,我々はアカウント乗っ取りによって投稿されるメッセージの一部であるスパムツイートの検出手法を提案し,8 割程度の正答率を得ている.同手法では特定の単語が含まれているスパムツイートを検出対象とし,検出の有効性を示している.本研究では同検出対象を広げ,アカウントの所持者以外が投稿したツイート全体を 「乗っ取りツイート」 として定義し,これを検出する手法を提案する.また本研究では,以前提案した手法に対してパラメータの再調整を行うと同時に,頻繁に用いるハッシュタグの種類及びリプライを送る相手が各アカウントにおいて特徴的であることを利用し,F 値の向上を図った.100 アカウントに対して評価実験を行った結果,我々が提案している従来手法と比較し,F 値を 0.1984 向上させ F 値 0.8570 を達成した.

    CiNii J-GLOBAL

  • 編集にあたって

    山名 早人, 酒井 哲也, 石川 佳治

    情報処理学会論文誌データベース(TOD)   6 ( 4 ) i - iii  2013.09

    CiNii

  • 刊行500号までの軌跡とこれからの論文誌のあり方

    山名 早人

    電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition)   96 ( 8 ) 1661 - 1662  2013.08

    CiNii

  • 医薬品副作用情報を用いた副作用検索システムの提案 (データ工学)

    三上 拓也, 駒田 康孝, 野口 保, 菅野 敦之, 山名 早人

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   113 ( 150 ) 59 - 64  2013.07

     View Summary

    医薬品の服用に伴う副作用の早期発見と対策は,医療現場において重要な課題である.副作用の早期発見と対策のためには,医師や薬剤師が全医薬品の全副作用を把握しておく必要がある.しかし,一つの医薬品に既知の副作用は数多くあり,全ての副作用の把握は困難である.また副作用には同義に解される類似表記が多くあり,同一の副作用にも関わらず表記違いにより異なる副作用として誤認し,副作用の発見が遅れる可能性がある.さらに,医薬品との関連性が立証されていない未知の副作用も想定される.そこで本稿では副作用の表記ゆれに頑健かつ,医薬品の未知の副作用検索に対応した副作用検索システムを提案する.提案手法では,医薬品の添付文書中の副作用や,副作用が疑われる症例報告を元に医薬品の未知の副作用を推定する.実験では,実際に副作用が疑われる症例報告があった事例150件を入力し,副作用の検索結果と,副作用が疑われる症例報告にある,医薬品との関連性が疑われる副作用を比較することにより有用性を評価した.実験の結果,副作用の検出率は72.7%であり,うち42.2%を未知の副作用として検出した.また従来,表記ゆれにより同一の副作用として検出できなかった既知の副作用29.3%を,副作用の表記ゆれを解消して同一の副作用として検出でき,提案手法が有用であることを確認した.

    CiNii J-GLOBAL

  • マイクロブログを対象とした1,000人レベルでの著者推定手法構築に向けて

    奥野峻弥, 浅井洋樹, 山名早人

    研究報告データベースシステム(DBS)   2013 ( 7 ) 1 - 6  2013.07

     View Summary

    従来,著者推定研究は小説に対する著者推定を中心に研究が行われており,限定された人数の著者候補者群を取り扱ってきた.またこれまでに,インターネットに投稿された文章を対象に 1 万人レベルでの著者推定手法を提案し,8 割程度の精度を得ている.しかし,多数のユーザが存在する,マイクロブログに投稿されるメッセージは,投稿数は多いが一度に投稿される文章量が短く,未知語や誤字脱字が多いという特徴が存在するため,これまでの手法では精度が低下してしまう.そこで,本研究ではメッセージから辞書を作成し,その辞書を用いた形態素解析器を利用することで少数のメッセージを利用した大規模人数に対する著者推定を行う手法を提案する.900 人の候補者から著者を推定する評価実験を行った結果,既存の著者推定手法よりも精度が上昇することが確認できた.

    CiNii

  • マイクロブログを対象とした1,000人レベルでの著者推定手法構築に向けて

    奥野峻弥, 浅井洋樹, 山名早人

    情報処理学会研究報告. 情報学基礎研究会報告   2013 ( 7 ) 1 - 6  2013.07

     View Summary

    従来,著者推定研究は小説に対する著者推定を中心に研究が行われており,限定された人数の著者候補者群を取り扱ってきた.またこれまでに,インターネットに投稿された文章を対象に 1 万人レベルでの著者推定手法を提案し,8 割程度の精度を得ている.しかし,多数のユーザが存在する,マイクロブログに投稿されるメッセージは,投稿数は多いが一度に投稿される文章量が短く,未知語や誤字脱字が多いという特徴が存在するため,これまでの手法では精度が低下してしまう.そこで,本研究ではメッセージから辞書を作成し,その辞書を用いた形態素解析器を利用することで少数のメッセージを利用した大規模人数に対する著者推定を行う手法を提案する.900 人の候補者から著者を推定する評価実験を行った結果,既存の著者推定手法よりも精度が上昇することが確認できた.

    CiNii

  • 医薬品副作用情報を用いた副作用検索システムの提案

    三上拓也, 駒田康孝, 野口保, 菅野敦之, 山名早人

    研究報告データベースシステム(DBS)   2013 ( 11 ) 1 - 6  2013.07

     View Summary

    医薬品の服用に伴う副作用の早期発見と対策は,医療現場において重要な課題である.副作用の早期発見と対策のためには,医師や薬剤師が全医薬品の全副作用を把握しておく必要がある.しかし,一つの医薬品に既知の副作用は数多くあり,全ての副作用の把握は困難である.また副作用には同義に解される類似表記が多くあり,同一の副作用にも関わらず表記違いにより異なる副作用として誤認し,副作用の発見が遅れる可能性がある.さらに,医薬品との関連性が立証されていない未知の副作用も想定される.そこで本稿では副作用の表記ゆれに頑健かつ,医薬品の未知の副作用検索に対応した副作用検索システムを提案する.提案手法では,医薬品の添付文書中の副作用や,副作用が疑われる症例報告を元に医薬品の未知の副作用を推定する.実験では,実際に副作用が疑われる症例報告があった事例 150 件を入力し,副作用の検索結果と,副作用が疑われる症例報告にある,医薬品との関連性が疑われる副作用を比較することにより有用性を評価した.実験の結果,副作用の検出率は 72.7% であり,うち 42.2% を未知の副作用として検出した.また従来,表記ゆれにより同一の副作用として検出できなかった既知の副作用 29.3% を,副作用の表記ゆれを解消して同一の副作用として検出でき,提案手法が有用であることを確認した.

    CiNii

  • マイクロブログを対象とした1,000人レベルでの著者推定手法構築に向けて(ブログ・ソーシャルネットワーク,ビッグデータを対象とした管理・情報検索・知識獲得及び一般)

    奥野 峻弥, 浅井 洋樹, 山名 早人

    電子情報通信学会技術研究報告. DE, データ工学   113 ( 150 ) 37 - 42  2013.07

     View Summary

    従来,著者推定研究は小説に対する著者推定を中心に研究が行われており,限定された人数の著者候補者群を取り扱ってきた.またこれまでに,インターネットに投稿された文章を対象に1万人レベルでの著者推定手法を提案し,8割程度の精度を得ている.しかし,多数のユーザが存在する,マイクロブログに投稿されるメッセージは,投稿数は多いが一度に投稿される文章量が短く,未知語や誤字脱字が多いという特徴が存在するため,これまでの手法では精度が低下してしまう.そこで,本研究ではメッセージから辞書を作成し,その辞書を用いた形態素解析器を利用することで少数のメッセージを利用した大規模人数に対する著者推定を行う手法を提案する.900人の候補者から著者を推定する評価実験を行った結果,既存の著者推定手法よりも精度が上昇することが確認できた.

    CiNii J-GLOBAL

  • 編集にあたって

    山名早人, 酒井哲也, 石川佳治

    情報処理学会論文誌データベース(TOD)   6 ( 3 ) i - iii  2013.06

    CiNii

  • Interaction prediction method of G-protein-coupled receptor and chemical compound with SVM

    OHNO Yorihito, TOH Hiroyuki, YAMANA Hayato

    IEICE technical report. Neurocomputing   113 ( 111 ) 55 - 61  2013.06

     View Summary

    G-protein-coupled receptors (GPCRs) are involved in the transduction of signals carried by the endogenous ligands into cytosolic regions, which are regarded as important targets to develop new drugs. Accurate prediction of interaction between GPCRs and chemical compounds is keenly required for drug development, because the number of the combinations of GPCR and the compounds is too large to be examined by experiments. Therefore, such computational approaches have been extensively investigated. One of the preceding studies by Okuno et al. had succeeded to achieve high performance in prediction by using the entire amino acid sequence of a GPCR and the chemical feature of a chemical compound. However, the amino acid residues involved in the ligand binding are quite limited. We estimate that the residues could strongly affect the binding. So, we identified the amino acid residues constituting ligand binding region from the 3D structure of GPCR. Then, we examined whether the use of the residues, instead of entire amino acid sequence, can improve the prediction. Support vector machine (SVM) was used for the prediction. Experimental result showed that the accuracy was improved by 3.6%, Fvalue was improved by 0.038% and AUC was improved by 0.002%, comparing to the approach by Okuno et al.

    CiNii J-GLOBAL

  • Identifying Topics and Influential Users based on Information Propagation in Twitter

    ZHOU Guanying, ZHANG Xuan, YAMANA Hayato

    IEICE technical report. Data engineering   113 ( 105 ) 29 - 33  2013.06

     View Summary

    Recently, Twitter has become an efficient tool for product promotion. Thus, both how to measure the influence of individuals and to identify influential Twitter users have great research value. Most of previous researches about influential Twitter users identification have been concentrated on the following and/or friend relationships over user network without taking the factors of real information propagation into account. We believe influential users are those who spread information in the propagation and thus they are key figures in advertising for commercial companies. In this paper, we proposed a new method to identify influential Twitter users in some popular topics in Twitter based on retweet relationship. We use LDA to detect topics and then rank Twitter users in each topic.

    CiNii

  • SCPSSMpred: A general sequence-based method for ligand-binding site prediction

    Chun Fang, Tamotsu Noguchi, Hayato Yamana

    IPSJ Transactions on Bioinformatics   6   35 - 42  2013.06

     View Summary

    In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior. ©2013 Information Processing Society of Japan.

    DOI

  • Low Latency Data Stream Processing on Multi-Core CPU Environments

    UEDA Takanori, AKIOKA Sayaka, YAMANA Hayato

    The IEICE transactions on information and systems (Japanese edetion)   96 ( 5 ) 1094 - 1104  2013.05

     View Summary

    データストリーム処理のアプリケーションには,アルゴリズム取引やネットワークパケット監視のように,大容量データストリームを低レイテンシで処理することが必要なものがある.マルチコアCPUを用いた並列処理により大容量ストリームの処理が可能であるが,オペレータごとにスレッドを割り当てると,CPUコア間通信やスレッド待機のオーバヘッドによりレイテンシが増大する.逆にスレッド数が少なすぎては並列性を生かせず,処理できるデータ量に限界が生じる.本論文では,CPUアーキテクチャやスレッド待機のオーバヘッドを考慮し,処理レイテンシを短縮するスレッド割当手法を提案する.マルチコア環境におけるデータストリーム処理のレイテンシ定義を与え,モデル上で最適なスレッド割当が求まることを示す.更に,入力ストリームのデータレート変化に応じてオペレータを再配置する際,ストリーム処理を止めずにタプル適用順序を守ってオペレータを再配置する方法を提案する.

    CiNii

  • 編集にあたって

    山名早人, 酒井哲也, 石川佳治

    情報処理学会論文誌データベース(TOD)   6 ( 2 ) i - iii  2013.03

    CiNii

  • Task graphs for machine learning algorithms

    AKIOKA SAYAKA, MURAOKA YOICHI, YAMANA HAYATO

    電子情報通信学会技術研究報告   112 ( 454(IBISML2012 93-109) ) 25 - 30  2013.02

     View Summary

    Abstract Applications to process a massive amount of data, so-called &quot;big data analysis&quot; , is one of the recent hot requirements, and a machine learning algorithm is highly expected to run much faster in scalable environment in order to fulfill the requirements. A machine learning algorithm often behaves quite differently from a data intensive application, which has been deeply investigated in high performance computing area. Therefore, a clear model of data access patterns, dependency analysis, and parallelism extraction on machine learning algorithms are indespensable so as to run machine learning algorithms faster in parallel and distributed computing environment such as the Cloud. This article reports task graphs generation for well-known machine learning algorithms, and the task graphs equip all the required information for parallel execution of machine learning algorithms.

    CiNii J-GLOBAL

  • 三角形特徴を用いた部分形状検索(ポスターセッション,大規模データベースとパターン認識)

    武井 宏将, 山名 早人

    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解   112 ( 441 ) 109 - 109  2013.02

     View Summary

    近年、三次元形状データはさまざまな分野で活用され、多くの三次元データが保存されている。三次元形状データの増加に伴い、三次元形状データの検索へのニーズが高まっている。形状検索は大きく全体形状検索と部分形状検索の二つに分けられる。全体形状検索では、クエリーとして与えた形状データと完全一致する形状データを検索する。一方、部分形状検索では、クエリーとして与えた形状データを含む形状データを検索する。形状検索を実行する多くの場面において、検索したい形状と同一の形状データをクエリーとして持っていることはほとんどなく、クエリーとして用いられるのは多くの場合、部分形状データとなる。そのため、全体形状検索よりも部分形状検索へのニーズの方が高い。一方で、部分形状検索はクエリーと検索したい形状が同一形状でないため形状間の対応付けが難しく、チャレンジンクな課題として知られている。部分形状検索の手法としては、Bag-of-featuresを用いる方法や特徴点のマッチングを用いる方法が知られている。しかし、Bag-of-featuresを用いる方法では、Bag-of-featuresは全体形状データまたは部分形状データを事前にヒストクラム表現するため、事前にヒストグラム表現した形状とクエリーが類似していなければ適用することは難しい。特徴点のマッチングを用いる方法では、特徴点のマッチングの精度が検索の精度に影響する。特に、誤対応した特徴点のマッチングが精度の低下をもたらす。本論文の提案手法は、三点の特徴点からなる三角形を用いることでマッチング精度の向上を実現する。クエリーとして与えた形状データから特徴点を抽出し、三点の特徴点を一組の三角形とする。局所特徴量ベクトルの距離による対応付けに基づき、保存された形状データの特徴点からなる三角形を作成し、その三角形どうしを比較する。三点の特徴点を用いることで局所的な情報だけではなく特徴点間の位置関係も考慮され、三角形の対応関係をチェックすることで誤対応を取り除くことができ、マッチング精度を向上することができる。また、各特徴点の局所特徴量ベクトルをインデックス化し、インデックスと三角形間の対応関係チェックを組み合わせて用いることで、高速な検索を実現する。本論文における、私たちの貢献は以下の2つである。1.三角形特徴を用いることで、精度向上を実現している2.特徴点の局所特徴量ベクトルのインデックス化と三角形間の対応関係チェックを組み合わせることで高速な検索を実現している実験として、形状データ100データをインデックス化したデータベースを作成し、インデックス化したデータから無作為に選んだ20データについて、形状データ全体の30〜50%程度を切り出したデータをクエリーとする部分形状検索を行った。本実験において、特徴点を単独で用いた場合の正解率0.65に対して、提案手法は正解率0.85を示した。また、インデックス化するデータの数を増やして検索速度を測定し、本提案手法がインデックス化されたデータ数に対して頑健に検索できることを示した。

    CiNii

  • 編集にあたって

    山名早人, 酒井哲也, 石川佳治

    情報処理学会論文誌データベース(TOD)   6 ( 1 ) i - iii  2013.01

    CiNii

  • マイクロブログを対象とした著者推定手法の提案-5,000人規模での著者推定-

    奥野峻弥, 浅井洋樹, 浅井洋樹, 山名早人, 山名早人

    情報処理学会シンポジウムシリーズ(CD-ROM)   2013 ( 5 )  2013

    J-GLOBAL

  • 教育環境における書き込み可能な電子ペーパー端末の利活用

    浅井 洋樹, 山名 早人

    大学ICT推進協議会年次大会論文集     3p  2013

    CiNii

  • ソーシャルメディアを含む多メディアビッグデータの統合的解析による情報抽出(ソーシャルメディア,ビッグデータとソーシャルコンピューティング,及び一般)

    上田 高徳, 浅井 洋樹, 藤木 紫乃, 山本 祐輔, 武井 宏将, 秋岡 明香, 山名 早人

    電子情報通信学会技術研究報告. DE, データ工学   112 ( 346 ) 53 - 58  2012.12

    CiNii

  • ソーシャルメディアを含む多メディアビッグデータの統合的解析による情報抽出

    上田高徳, 浅井洋樹, 藤木紫乃, 山本祐輔, 武井宏将, 武井宏将, 秋岡明香, 山名早人, 山名早人

    電子情報通信学会技術研究報告   112 ( 346(DE2012 27-40) ) 53 - 58  2012.12

    CiNii J-GLOBAL

  • ソーシャルメディアを含む多メディアビッグデータの統合的解析による情報抽出

    上田高徳, 浅井洋樹, 藤木紫乃, 山本祐輔, 武井宏将, 秋岡明香, 山名早人

    情報処理学会研究報告. データベース・システム研究会報告   2012 ( 8 ) 1 - 6  2012.12

     View Summary

    本稿では我々が取り組んでいる多メディアビッグデータの統合的解析による情報抽出の試みについて述べる.ソーシャルメディアの普及によって,様々な情報がリアルタイムにインターネット上にアップロードされるようになった.我々は,単一のソーシャルメディアだけでなく,複数の情報源を組み合わせた, 「多メディアデータ」 を解析することで,より有益な情報を抽出できると考えている.本稿では我々が取り組んでいる多メディア解析について述べる.また,大規模リアルタイムデータの解析をサポートするために開発している,並列分散処理フレームワーク QueueLinker についても述べる.

    CiNii

  • 形態素間の優先関係を考慮した略語生成手法

    田中友樹, 及川孝徳, 山名早人, 山名早人, 大西貴士, 土田正明, 石川開

    情報処理学会シンポジウムシリーズ(CD-ROM)   2012 ( 5 ) ROMBUNNO.B4,TANAKA  2012.11

    J-GLOBAL

  • Producer‐Consumer型モジュールで構成された並列分散Webクローラの開発

    上田高徳, 佐藤亘, 鈴木大地, 打田研二, 森本浩介, 秋岡明香, 山名早人, 山名早人

    情報処理学会シンポジウムシリーズ(CD-ROM)   2012 ( 5 ) ROMBUNNO.B3,UEDA  2012.11

    J-GLOBAL

  • 筆記情報と時系列モデルを用いた学習者つまずき検出(教育・学習支援プラットフォーム/一般)

    浅井 洋樹, 野輝 明里, 苑田 翔吾, 山名 早人

    電子情報通信学会技術研究報告. ET, 教育工学   112 ( 269 ) 65 - 70  2012.10

     View Summary

    生徒の学習を支援する際に必要なプロセスとして,つまずきの検知が挙げられる.CAIのつまずき検出に関する研究では,採点結果や解答所要時間,センサーから取得した学習者の顔画像や脈拍などの生体情報,そして入力デバイスであるキーボードやマウスの操作履歴を利用して検知を行う研究が行われてきた.しかし現状の初等教育では筆記活動を中心とした環境であり,こうした環境におけるつまずき検出に関しては深い議論が行われてこなかった.本報告では生徒が利用するペンから得られる筆記情報を元に,つまずきを検出する手法について検討を行う.検出には時系列モデルであるARモデルを用いて学習者の手書き行動が変化する変化点を検出し,変化点間ごとに推定を行う.実施した試験評価において一定の検出性能が確認できた.

    CiNii J-GLOBAL

  • 強調表記を利用した手書きドキュメント検索スニペット生成

    浅井洋樹, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM)   2012 ( 3 ) ROMBUNNO.DBS-154,NO.8  2012.10

    J-GLOBAL

  • 編集にあたって

    山名早人, 酒井哲也, 石川佳治

    情報処理学会論文誌データベース(TOD)   5 ( 3 ) i - iii  2012.09

    CiNii

  • 強調表記を利用した手書きドキュメント検索スニペット生成

    浅井洋樹, 山名早人

    情報処理学会研究報告. データベース・システム研究会報告   2012 ( 8 ) 1 - 7  2012.07

     View Summary

    近年,タブレット端末や電子ペンに代表される手書き入力可能な端末が普及し始めたことにより,手書きドキュメントの電子化が進みつつある.端末上でのドキュメント探索,閲覧プロセスの過程において各ドキュメントの概要把握を目的とした閲覧時では,元ドキュメントを縮小したサムネイルや,要約テキストを出力するテキストスニペットが一覧表示のスニペットとしてしばし用いられる.しかし,手書きドキュメントに対して従来の単純に縮小したサムネイルを用いると,文字が要約されずに縮小されてしまうため記述内容が読み取れず,概要把握が困難となる問題がある.また,図などの文字以外の情報が含まれ,不完全な文字認識しか行えない手書きドキュメントを要約する研究は,我々の知る限り存在しない.そこで本稿では,下線や囲い込みに代表される筆記者の強調表記を利用して,手書きドキュメントを要約することにより概要の把握が容易となる検索スニペットを提案する.ユーザによる情報検索評価実験の結果,従来と比較して我々の提案するスニペットを利用することで検索速度が平均 42% 削減される結果が得られた.

    CiNii

  • 強調表記を利用した手書きドキュメント検索スニペット生成

    浅井洋樹, 山名早人

    情報処理学会研究報告. 情報学基礎研究会報告   2012 ( 8 ) 1 - 7  2012.07

     View Summary

    近年,タブレット端末や電子ペンに代表される手書き入力可能な端末が普及し始めたことにより,手書きドキュメントの電子化が進みつつある.端末上でのドキュメント探索,閲覧プロセスの過程において各ドキュメントの概要把握を目的とした閲覧時では,元ドキュメントを縮小したサムネイルや,要約テキストを出力するテキストスニペットが一覧表示のスニペットとしてしばし用いられる.しかし,手書きドキュメントに対して従来の単純に縮小したサムネイルを用いると,文字が要約されずに縮小されてしまうため記述内容が読み取れず,概要把握が困難となる問題がある.また,図などの文字以外の情報が含まれ,不完全な文字認識しか行えない手書きドキュメントを要約する研究は,我々の知る限り存在しない.そこで本稿では,下線や囲い込みに代表される筆記者の強調表記を利用して,手書きドキュメントを要約することにより概要の把握が容易となる検索スニペットを提案する.ユーザによる情報検索評価実験の結果,従来と比較して我々の提案するスニペットを利用することで検索速度が平均 42% 削減される結果が得られた.

    CiNii

  • 編集にあたって

    山名早人, 酒井哲也, 石川佳治

    情報処理学会論文誌データベース(TOD)   5 ( 2 ) i - iii  2012.06

    CiNii

  • Welcome message from MAW-2012 international symposium organizers

    Takahiro Hara, Kin Fun Li, Hayato Yamana, Shengrui Wang

    Proceedings - 26th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2012    2012.05

    DOI

  • ThumbPop : 注目物体を強調した疑似立体サムネイル生成 (ヒューマン情報処理)

    新井 啓介, 武井 宏将, 山名 早人

    電子情報通信学会技術研究報告 : 信学技報   111 ( 500 ) 177 - 182  2012.03

    CiNii

  • Model-based Gaze Estimation Method with Low-Resolution RGB Images of Eyes

      111 ( 499 ) 31 - 36  2012.03

    CiNii

  • ThumbPop : Visual Attention-Based Pseudo 3D Thumbnail

      111 ( 499 ) 177 - 182  2012.03

    CiNii

  • 低解像度可視光目画像を用いたモデルベース視線推定手法 (ヒューマン情報処理)

    福田 崇, 山名 早人

    電子情報通信学会技術研究報告 : 信学技報   111 ( 500 ) 31 - 36  2012.03

    CiNii

  • Model-based Gaze Estimation Method with Low-Resolution RGB Images of Eyes

    FUKUDA Takashi, YAMANA Hayato

    Technical report of IEICE. HIP   111 ( 500 ) 31 - 36  2012.03

     View Summary

    Recording eye-gaze data from many people in natural situations are valuable in Human Factors, Market Strategy and any other studies. The gaze tracking system equipped with webcams is suitable to record eye-gaze data. The resolutions of eye-images captured by webcams are low. Low-resolution eye-images tend to be distorted and cause errors in gaze estimations. We have developed the gaze tracking system with low-resoltion eye-images. In our past study, we extract contours of pupils, remove distorted areas in these contours, approximate pupil contours into an ellipse, and calculate gaze directi...

    CiNii

  • ThumbPop : Visual Attention-Based Pseudo 3D Thumbnail

    ARAI Keisuke, Takei Hiromasa, YAMANA Hayato

    Technical report of IEICE. HIP   111 ( 500 ) 177 - 182  2012.03

     View Summary

    We propose a pseudo 3D thumbnail generation method to improve the recognizability of visual attention objects by decomposing an original image into visual attention objects and a background. Thumbnails created by scaling, used as general, are not suitable for mobile devices equipped a small display, because attention objects become too small to recognize. Furthermore, previous visual attention-based thumbnail generation methods cannot preserve recognizability of visual attention objects by burying them in the background image. To solve these problems, we propose a pseudo 3D thumbnail genera...

    CiNii

  • Model-based Gaze Estimation Method with Low-Resolution RGB Images of Eyes

    FUKUDA Takashi, YAMANA Hayato

    Technical report of IEICE. PRMU   111 ( 499 ) 31 - 36  2012.03

     View Summary

    Recording eye-gaze data from many people in natural situations are valuable in Human Factors, Market Strategy and any other studies. The gaze tracking system equipped with webcams is suitable to record eye-gaze data. The resolutions of eye-images captured by webcams are low. Low-resolution eye-images tend to be distorted and cause errors in gaze estimations. We have developed the gaze tracking system with low-resoltion eye-images. In our past study, we extract contours of pupils, remove distorted areas in these contours, approximate pupil contours into an ellipse, and calculate gaze directi...

    CiNii J-GLOBAL

  • ThumbPop : Visual Attention-Based Pseudo 3D Thumbnail

    ARAI Keisuke, Takei Hiromasa, YAMANA Hayato

    Technical report of IEICE. PRMU   111 ( 499 ) 177 - 182  2012.03

     View Summary

    We propose a pseudo 3D thumbnail generation method to improve the recognizability of visual attention objects by decomposing an original image into visual attention objects and a background. Thumbnails created by scaling, used as general, are not suitable for mobile devices equipped a small display, because attention objects become too small to recognize. Furthermore, previous visual attention-based thumbnail generation methods cannot preserve recognizability of visual attention objects by burying them in the background image. To solve these problems, we propose a pseudo 3D thumbnail genera...

    CiNii J-GLOBAL

  • Visual-Attention-based Thumbnail using Two-Stage GrabCut

    Keisuke Arai, Hiromasa Takei, Hayato Yamana

    2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS)     96 - 101  2012

     View Summary

    This paper proposes a new thumbnail generation method to improve the recognizability of visual attention objects on small displays. Previous methods such as simple scaling reduce the recognizability of original images because the visual attention objects become too small to recognize. When we view thumbnails on small displays such as those of mobile devices, recognizability is indispensable for handling many images simultaneously. To solve the problem of low recognizability of visual attention objects, we adopt GrabCut to extract visual attention objects from an original image and then divide the original image into visual attention objects and a background image. While the background image is reduced to fit the size of a thumbnail, the extracted visual attention objects are merged into the reduced background image to preserve their recognizability. In adopting GrabCut, we propose a two-stage GrabCut method to automate the extraction of attention objects; the extraction was performed by hand in previous methods. Our experimental results show that our proposed method is able to shorten the search time by 44% and improve the precision of the search by 19% in comparison with simple scaling.

    DOI

  • データストリーム処理におけるレイテンシ最小化と高可用性のためのオペレータ実行方法

    上田高徳, 打田研二, 秋岡明香, 山名早人, 山名早人

    情報処理学会シンポジウムシリーズ(CD−ROM)   2011 ( 5 ) ROMBUNNO.2G-2,UEDA  2011.10

    J-GLOBAL

  • 品詞n‐gramを用いた著者推定手法—話題に依存しない頑健性の評価—

    井上雅翔, 中島泰, 山名早人, 山名早人

    情報処理学会シンポジウムシリーズ(CD−ROM)   2011 ( 5 ) ROMBUNNO.ROMBUNSHOSESSHON,INOE  2011.10

    J-GLOBAL

  • A Proposal and Validity Inspection of Reliability Evaluation Method for Search Engines' Hit Count

    SATO KO, UCHIDA KENJI, YAMANA HAYATO, YAMANA HAYATO

    情報処理学会研究報告(CD−ROM)   2011 ( 3 ) ROMBUNNO.DBS-152,NO.8  2011.10

    J-GLOBAL

  • 検索エンジンのヒット数に対する信頼性評価指標の提案とその妥当性検証

    佐藤 亘, 打田 研二, 山名 早人

    情報処理学会研究報告. データベース・システム研究会報告   2011 ( 8 ) 1 - 8  2011.07

     View Summary

    近年,自然言語処理をはじめとする数多くの研究が,検索エンジンから得られる検索結果数,すなわちヒット数を利用している.しかしながら,検索エンジンが返すヒット数は検索するタイミングによって不自然に変化し,研究のベースとして用いるには無視できないほどの大きな誤差が生じることがある.そのため,ヒット数の信頼性を評価,向上させる手法を考えることは,大きな課題である考えられる.我々はこの課題に対して,信頼できるヒット数を得ることができる条件の特定を試みた研究や,実際に得られたヒット数の信頼性を定量的に評価できる手法の提案を行ってきた.本論文では,後者の研究に追加して,信頼性評価指標の妥当性検証実験を行ったので結果を報告する.

    CiNii

  • A Proposal and Validity Inspection of Reliability Evaluation Method for Search Engines' Hit Count

    Koh Satoh, Kenji Uchida, Hayato Yamana

    IPSJ SIG Notes   2011 ( 8 ) 1 - 8  2011.07

     View Summary

    Recently, there exit numerous researches based on the number of search results, or hit count. However, hit counts returned by search engines can fluctuate unnaturally when observed on different days, and may cause too large errors to be used in researches. Therefore, it is important to discuss on how we can evaluate and improve the reliability of hit counts. We have performed several researches about this problem such as a research to specify the conditions in which search engines can return reliable hit counts, and a research to define the reliability evaluation metrics. In this paper, in ...

    CiNii J-GLOBAL

  • Extraction of Emphasized Words from On-line Handwritten Notebooks

    浅井 洋樹, 山名 早人

    日本データベース学会論文誌   10 ( 1 ) 67 - 72  2011.06

    CiNii J-GLOBAL

  • Welcome message from MAW 2011 symposium chairs

    Takahiro Hara, Kin Fun Li, Shengrui Wang, Hayato Yamana

    Proceedings - 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2011     48  2011.05

    DOI

  • Welcome message from the QuEST 2011 workshop chairs

    Kin Fun Li, Rick McGeer, Stephen Neville, Hayato Yamana

    Proceedings - 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2011    2011.05

    DOI

  • Time-weighted web authoritative ranking

    Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    INFORMATION RETRIEVAL   14 ( 2 ) 133 - 157  2011.04

     View Summary

    We investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, respectively. An experimental dataset is created by crawling selected web pages for a period of several months. This data is used to compare page rankings by human users with rankings computed by the standard PageRank algorithm (which does not include temporal factors) and three algorithms that incorporate temporal factors, including the Time-Weighted PageRank (TWPR) algorithm introduced here. Analysis of the rankings shows that all three temporal-aware algorithms produce rankings more like those of human users than does the PageRank algorithm. Of these, the TWPR algorithm produces rankings most similar to human users&apos;, indicating that all three temporal factors are relevant in page ranking. In addition, analysis of parameter values used to weight the three temporal factors reveals that age factor has the most impact on page rankings, while trend and event factors have the second and the least impact. Proper weighting of the three factors in TWPR algorithm provides the best ranking results.

    DOI

  • 情報検索コース

    監修, 神門 典子, 山名 早人

    Webラーニングプラザ:技術者Web学習システム(技術者向けeラーニング), 科学技術振興機構    2011.03  [Invited]

  • レビューからの商品比較表の自動生成

    相川直視, 山名早人, 山名早人

    言語処理学会年次大会発表論文集   17th (CD-ROM)   ROMBUNNO.D2-3  2011.03

    J-GLOBAL

  • ロック制御型同期複製ミドルウェアの提案

    堀井洋, 堀井洋, 小野寺民也, 山名早人

    電子情報通信学会論文誌 D   J94-D ( 3 ) 515-524  2011.03

    J-GLOBAL

  • Increase the Image Search Results by Using Flickr Tags

    ShanBin Chan, 佐藤真一, 山名早人

    DEIM2011   B1-3  2011

  • 2段階LDAを用いたインクリメンタルなソフトウェアレポジトリの自動分類手法

    井上雅翔, 新井啓介, 山名早人

    DEIM2011   E4-5  2011

  • ウェブサーバへの最短訪問間隔を保証する時間計算量がO(1)のウェブクローリングスケジューラ

    森本浩介, 上田高徳, 打田研二, 山名早人

    DEIM2011   B5-6  2011

  • 品詞と助詞の出現パターンを用いた類似著者の推定とコミュニティ抽出

    中島泰, 山名早人

    DEIM2011   C6-5  2011

  • 検索エンジンのヒット数の信頼性に対する評価

    佐藤亘, 打田研二, 山名早人

    DEIM2011   E6-1  2011

  • 結晶化環境におけるpH値を考慮したSVMによるタンパク質結晶化の予測

    片岡義雅, 野口保, 百石弘澄, 小林大輔, 山名早人

    DEIM2011   D8-1  2011

  • 筆記者の強調表現に基づいたオンライン手書きノートの圧縮サムネイル生成手法

    浅井洋樹, 小林大輔, 山名早人

    DEIM2011   E8-6  2011

  • Cannyエッジ情報に基づく人物画像における髪型の定量化

    須藤優介, 福田崇, 山名早人

    DEIM2011   E9-6  2011

  • 字幕テキストの利用によるマイクロブログからのテレビ番組に言及したメッセージ検出手法

    山本祐輔, 及川孝徳, 山名早人

    DEIM2011   A10-1  2011

  • レビューからの商品比較表の自動生成

    相川直視, 山名早人

    自然言語処理学会第17回年次大会   D2-3  2011

  • Increase the Image Search Results by Using Flickr Tags

    ShanBin Chan, 佐藤真一, 山名早人

    DEIM2011   B1-3  2011

  • Extraction of Distinctive Phrases from Mini Blog Entries and Application for Topic Tracking across the Media

    KATO NORIKAZU, AKIOKA SAYAKA, MURAOKA YOICHI, YAMANA HAYATO

    情報処理学会研究報告(CD−ROM)   2010 ( 4 ) ROMBUNNO.DBS-151,22 - 8  2010.12

    CiNii J-GLOBAL

  • Dynamic Resource Allocation for Streaming Applications in Cloud Environment

    Sayaka Akioka, Norikazu Kato, Yoichi Muraoka, Hayato Yamana

    IPSJ SIG Notes   2010 ( 8 ) 1 - 7  2010.11

     View Summary

    Streaming application, which requires to process data frequently arrives in chronological order, is now a center of interest. This paper proposes a methodology to parallelize, and dynamically allocate streaming applications over distributed environment such as cloud computing environment. The simulation results approved that practical streaming applications need to be processed in parallel in order to avoid loss of data for lack of processing time. However, the methodology proposed in this paper enables all the input data processed with 26% overhead of average execution time of each block o...

    CiNii J-GLOBAL

  • Extraction of Distinctive Phrases from Mini Blog Entries and Application for Topic Tracking across the Media

    Norikazu Kato, Sayaka Akioka, Yoichi Muraoka, Hayato Yamana

    IPSJ SIG Notes   2010 ( 22 ) 1 - 8  2010.11

     View Summary

    A mini blog service, including Twitter, is one of emerging media of note. Across-the-board analysis in posted blogs, and descriptions in related media, such as TV, newspapers, and other media, is indispensable for social analysis. Posts in mini blogs, however, often include names of particular movies, novels, and products, and many of which are compounders. A compounder is often divided into several words by word processors, and difficult to extract as one solid word. Here, if a hot compounder is extracted as it is supposed to be, the quality of morphological analysis is improved to contrib...

    CiNii J-GLOBAL

  • Speed-Up of Resizable-LSH for Similarity-Based Range Query

    Kunihiro Yamazaki, Hayato Yamana

    IPSJ SIG Notes   2010 ( 5 ) 1 - 8  2010.09

     View Summary

    In this paper, we improve our previously proposed Resizable-LSH that enhances the range query on approximate similarity search faster. Nowadays, Locality-Sensitive Hashing (LSH) is drawing attention as an effective algorithm for approximate nearest neighbor search. LSH adopts hash functions that collide with high probability if two vectors are close, so that LSH finds approximate nearest neighbors quickly even if the dataset is high-dimensional. However, LSH uses fixed search range on generating hash tables, so resizing the search range costs expensive. As a solution, we&#039;ve proposed the alg...

    CiNii

  • The 2010 IEEE International Symposium on Mining and Web (MAW): Welcome message from symposium organizers

    Takahiro Hara, Kin Fun Li, Shengrui Wang, Hayato Yamana, Laurence T. Yang, Yanchun Zhang

    24th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2010    2010.07

    DOI

  • The 2010 IEEE International Workshop on Quantitative Evaluation of large-scale Systems and Technologies (QuEST): Welcome message from workshop organizers

    Kin Fun Li, Rick McGeer, Stephen Neville, Hayato Yamana

    24th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2010     90  2010.07

    DOI

  • Two step adjustment technique of term weight

    YANO Hiroya, NAKAJIMA Tai, YAMANA Hayato

    IEICE technical report. Data engineering   110 ( 107 ) 45 - 50  2010.06

     View Summary

    TF・IDF method is one of the methods to weight terms in the field of document retrieval. IDF value shows the degree of how a term is difficult to appear in the document set, and depends on the document set to be retrieved. Therefore, the problem is that, even if a term is difficult to appear in the same field of document set as query(which means the term is highly specific in the document), IDF value of term which appears easily in the document set to be retrieved is small. In this paper, we propose and study two step adjustment technique of term weight. In the first step, we get documents r...

    CiNii J-GLOBAL

  • Similar object detection using template matching focused on positional relationship of feature regions

    Keisuke Arai, Kosuke Morimoto, Hayato Yamana

    IPSJ SIG Notes. CVIM   2010 ( 4 ) 1 - 8  2010.05

     View Summary

    The similar object detection from a large quantity of images helps us to be able to organize images by category and research market by using images on the Web. Template matching that can detect similarity object doesn&#039;t suit unknown images so that there is an assumption that target image contains same object. In this paper, we are aimed at decreasing false-positive rate due to the premise of template matching. We propose the method that considers the positional relationships of the feature regions with conventional template matching. Each feature region in template image matches target imag...

    CiNii

  • Model-Based Gaze Tracking with Low-cost Web Cameras

    FUKUDA Takashi, MATSUZAKI Katsuhiko, YAMANA Hayato

    Technical report of IEICE. HIP   109 ( 471 ) 113 - 118  2010.03

     View Summary

    The gaze estimation without restraining users at home will be contributed to reformation of user interfaces. The commercial gaze estimation systems show high accuracy by using infrared, however, gaze estimation systems with web cameras are highly desired at homes because of their price. The problem using web cams is its low resolution for gaze estimation. Low resolution images are strongly influenced by noises. So the past studies do not estimate detailed direction of eyes. In this paper, we propose a new gaze estimation method which shows high accuracy using image processing and geometrica...

    CiNii J-GLOBAL

  • Model-Based Gaze Tracking with Low-cost Web Cameras

    FUKUDA Takashi, MATSUZAKI Katsuhiko, YAMANA Hayato

    Technical report of IEICE. PRMU   109 ( 470 ) 113 - 118  2010.03

     View Summary

    The gaze estimation without restraining users at home will be contributed to reformation of user interfaces. The commercial gaze estimation systems show high accuracy by using infrared, however, gaze estimation systems with web cameras are highly desired at homes because of their price. The problem using web cams is its low resolution for gaze estimation. Low resolution images are strongly influenced by noises. So the past studies do not estimate detailed direction of eyes. In this paper, we propose a new gaze estimation method which shows high accuracy using image processing and geometrica...

    CiNii

  • 6K-7 Data Mining Algorithms Classified Based on Data Access Patterns

    Akioka Sayaka, Muraoka Yoichi, Yamana Hayato, Nakajima Tatsuo

    全国大会講演論文集   72 ( 5 ) "5 - 105"-"5-106"  2010.03

    CiNii

  • Time-weighted web authoritative ranking

    Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    Information Retrieval   Vol.13 ( No.4 )  2010

  • 特定言語Webページ収集のためのフォーカストクローラの性能改善手法

    詹 善斌, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • 字幕テキストの利用によるブログで引用されたテレビ番組の推定

    及川 孝徳, 中島 泰, 松崎 勝彦, 黒木 さやか, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • アンカーテキストとリンク構造を用いた同義語抽出手法

    黒木 さやか, 立石 健二, 細見 格, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • Winnyネットワーク上を流通するコンテンツの傾向と分析

    打田 研二, 高木 浩光, 山崎 邦弘, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • WWWにおけるP3Pコンパクトポリシーの利用状況に関する調査

    櫻井 宏樹, 高木 浩光, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • Unexpected and Interesting: 動画視聴サイトにおける発見性 を重視した動画推薦手法の提案

    中村 智浩, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • QueueLinker: パイプライン型アプリケーションのための分散処理フレームワーク

    上田 高徳, 片瀬 弘晶, 森本 浩介, 打田 研二, 油井 誠, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • LittleWeb: 類似ノード集約によるWebグラフ圧縮手法

    片瀬弘晶, 上田 高徳, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • Hit Count Dance -検索エンジンのヒット数に関する信頼性検証-

    舟橋卓也, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010)    2010

  • 安価なWebカメラを用いたModel-Based視線推定

    福田 崇, 松崎勝彦, 山名早人

    信学技報(PRMU)   Vol.2009 ( No.252 ) 113 - 118  2010

  • データアクセスパターンに基づくデータマイニング手法の分類

    秋岡明香, 村岡洋一, 山名早人, 中島達夫

    第72回情処全大   6K-7 ( 5 )  2010

    J-GLOBAL

  • Similar object detection using template matching focused on positional relationship of feature regions

    新井啓介, 森本浩介, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM)   Vol.2010-CVIM-172 ( No.4 ) 1 - 8  2010

    J-GLOBAL

  • Search Engines’ Trustworthiness-Current Status

    Hayato YAMANA

    Proc. of the 5th Korea-Japan Database Workshop     219 - 240  2010

  • 検索語の重みの2段階調整手法

    矢野博也, 中島泰, 山名早人

    信学技報   Vol.110 ( No.107 ) 45 - 50  2010

  • 領域分割と色特徴を利用したテンプレートマッチングによる類似物体検出

    新井啓介, 森本浩介, 山名早人

    MIRU2010,IS2-42    2010

  • 動画像における正面画像推定からの衣服領域抽出

    金正文, 森本浩介, 山名早人

    MIRU2010, IS3-36    2010

  • 低解像度目画像からのModel-Based視線推定

    福田崇, 松崎勝彦, 山名早人

    MIRU2010, IS1-46    2010

  • Localized Multiple Kernel Learningを用いた画像分類

    小林大輔, 相川直視, 山名早人

    MIRU2010, IS2-43    2010

  • Data Access Pattern Analysis on Stream Mining Algorithms for Cloud Computation,

    Sayaka Akioka, Hayato Yamana, Yoichi Muraoka

    Proc. of the 2010 Int'll Conf. on Parallel and Distributed Processing Techniques and Applications    2010

  • The Method of Improving the Specific Language Focused Crawler,

    Shan-Bin Chan, Hayato Yamana

    Proc. of the 1st CIPS-SIGHAN Joint Conf. on Chinese Language Processing(CLP2010)    2010

  • Speed-Up of Resizable-LSH for Similarity-Based Range Query

    山崎邦弘, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM)   Vol.2010-AL-131 ( No.5 ) 1 - 8  2010

    J-GLOBAL

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi AIKAWA, Tetsuya SAKAI, Hayato YAMANA

    WebDBForum2011    2010

  • Time-weighted web authoritative ranking

    Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    Information Retrieval   Vol.13 ( No.4 )  2010

  • Search Engines’ Trustworthiness-Current Status

    Hayato YAMANA

    Proc. of the 5th Korea-Japan Database Workshop     219 - 240  2010

  • Reliability Verification of Search Engines' Hit Counts: How to Select a Reliable Hit Count for a Query

    Takuya Funahashi, Hayato Yamana

    CURRENT TRENDS IN WEB ENGINEERING   6385s   114 - 125  2010

     View Summary

    In this paper, we investigate the trustworthiness of search engines' hit counts, numbers returned as search result counts. Since many studies adopt search engines' hit counts to estimate the popularity of input queries, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they change, when a user clicks the "Search" button more than once or clicks the "Next" button on the search results page, or when a user queries the same term on separate days. In this paper, we analyze the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. The results of our study show that the hit counts with the largest search offset just before search engines adjust their hit counts are the most reliable. Moreover, hit counts are the most reliable when they are consistent over approximately a week.

    DOI

  • Data Access Pattern Analysis on Stream Mining Algorithms for Cloud Computation,

    Sayaka Akioka, Hayato Yamana, Yoichi Muraoka

    Proc. of the 2010 Int'll Conf. on Parallel and Distributed Processing Techniques and Applications    2010

  • The Method of Improving the Specific Language Focused Crawler,

    Shan-Bin Chan, Hayato Yamana

    Proc. of the 1st CIPS-SIGHAN Joint Conf. on Chinese Language Processing(CLP2010)    2010

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi AIKAWA, Tetsuya SAKAI, Hayato YAMANA

    WebDBForum2011    2010

  • Cross-media impact on Twitter in Japan

    Sayaka Akioka, Norikazu Kato, Yoichi Muraoka, Hayato Yamana

    International Conference on Information and Knowledge Management, Proceedings     111 - 118  2010

     View Summary

    Twitter, a microblogging service, is now grabbing attention of people as a new channel. For deep understanding of this new service, this paper reports the characteristics of Twitter users in Japan, and the impact of media such as publications, and TV programs on Twitter community. To the best of our knowledge, this paper is the first to analyze mutual impact between Twitter, and other media quantitatively. In order for the analyses, we crawled user profiles whose language setting is Japanese, and conducted several analysis with well-known methodologies as conventional work did. We confirmed the characteristics of the collected user profiles. We observed the distributions of the number of friends, and the number of follows both follow power-law, and there exists the correlation between the number of friends, and the number of follows. Besides the collected user profiles, we also utilized closed caption data of TV programs in Japan, and other information on media picked up Twitter. We run a batch of matching these data outside Twitter with the collected user profiles, and concluded Twitter has been already widely spread among Japanese people, however, media have still huge impact on the growth of Twitter users. We also conjectured the impact is not one-sided, however, is mutual influence between Twitter, and other media. © 2010 ACM.

    DOI

  • A Lock-free GCLOCK Page Replacement Algorithm

      2 ( 4 ) 32 - 48  2009.12

     View Summary

    In this paper, we propose a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK. Concurrent access to the buffer management module is a major factor that prevents database scalability to processors. Therefore, we propose a non-blocking scheme for bufferfix operations that fix buffer frames for requested pages without locks by combining Nb-GCLOCK and a wait-free hash table. Our experimental results revealed that our scheme can obtain nearly linear scalability to processors up to 64 processors, although the existing locking-based schemes do not scale beyond 16 processors.

    CiNii

  • Prediction of GPCR ligands by 2-way prediction method

    Hiroto Hyakkoku, Minoru Sugihara, Makiko Suwa, Tsuyoshi Kato, Hayato Yamana, Wataru Fujibuchi

    IPSJ SIG technical reports   2009 ( 2 ) 1 - 8  2009.09

     View Summary

    G-protein coupled receptors (GPCRs) are important pharmacological targets and to predict unknown interactions between GPCRs and ligands is one of the most interesting topics in the current computational biology. However, ligands of many GPCRs are experimentally not identified yet and it is difficult to predict unknown ligands of GPCRs because of insufficiency of training data set. We have developed a 2-way prediction method based on the support vector machine. In this method, the prediction is performed by using both information of ligands and GPCRs and one can apply this method to the case...

    CiNii

  • Proposal of Word Salad Spam Detection Method using N-gram and Interrupted Collocations

    MORIMOTO Kosuke, KATASE Hiroaki, YAMANA Hayato

    研究報告情報学基礎(FI)   95 ( 24 ) X1 - X8  2009.07

     View Summary

    インターネット上にウェブページが爆発的に増加し,インターネットから得られる情報が重要になっている.しかし,ウェブページの爆発的な増加につれてスパム行為を行うページも同様に増加し,インターネットから得られる情報の価値を下げている.スパム行為には様々な手法があるが,本論文では自動的に文章を生成するワードサラダに着目し,ワードサラダ型のスパムを効率的に検出する手法を提案する.ワードサラダ型スパムを検出するため,n-gram と離散型共起表現を用いてカルバック・ライブラー情報量に基づく文章のスコアを計算し,計算したスコアに基づき判定を行う.提案手法の評価実験を行った結果,既存手法と比較して F 値で 0.18 の性能の向上を確認できた.Information on the Internet becomes important because of exploding Web page. However, Spam pages also have exploded and information from the Internet have become lower reliability. Though there are many Spamming methods, in this article we focus on &quot;word salad&quot; that creates text automatically, and we propose the effective method of word salad detection. We detect word salad by the score based on Kullback-Leibler divergence calculated with n-gram and interrupted collocation. As a result of experiment, our method improves 0.18 points in F-value from the existing method.

    CiNii

  • Resizable-LSH : An Approximate Similarity Search Algorithm for Resizable Range-Search

    YAMAZAKI Kunihiro, NAKAMURA Tomohiro, FUNAHASHI Takuya, YAMANA Hayato

    研究報告データベースシステム(DBS)   148 ( 22 ) V1 - V8  2009.07

     View Summary

    本稿では閾値を可変にした近似的な類似検索手法を提案する.近年,距離を用いた類似検索手法の 1 つとして,Locality-Sensitive Hashing (局所性鋭敏型ハッシング,LSH) による近似的な類似検索が注目されている.LSHは,「距離が近い入力同士は高い確率で衝突する」 特徴を持つハッシュ関数を用いたデータマッピング手法であり,高次元なデータに対しても高速に近傍検索を行うことができる.しかし LSH では,事前計算によって距離が近いデータ同士を同じハッシュ値にマッピングするため,検索時に類似度の閾値を変更することができない.閾値を変更するにはハッシュテーブルの再構築が必要になるため,ユーザが閾値を指定できるような類似検索は実現困難である.そこで本研究では,類似検索時に,クエリとハッシュ値が一致するデータに加え,ハッシュ値が近いデータも取得することで,ハッシュテーブルの再構築を行うことなく,閾値を指定できる類似検索を実現した.提案手法は,閾値に合わせてハッシュテーブルを逐次再構築する LSH と比較して,同程度の精度で,かつ 1,000 倍程度の高速化を達成できることを実験により確認した.We introduce an efficient algorithm named &quot;Resizable-LSH&quot; for approximate similarity search, which enables resizing the search range flexibly. Nowadays, Locality-Sensitive Hashing (LSH) is drawing attention as an efficient algorithm for approximate nearest neighbor search. LSH adopts hash functions that collide with high probability if two vectors are close, so that LSH finds approximate nearest neighbors quickly even if the dataset is high-dimensional. However, LSH should generate hash tables preliminarily, that results in resizing the search range costs expensive because hash table regeneration is required whenever we face the needs to resize search range. To solve the problem, our proposed Resizable-LSH retrieves not only the same hash value of query, but also near hash values. Then Resizable-LSH achieves resizable range-search. As it turns out, the result of the experiments shows Resizable-LSH works about 1,000 times faster than LSH with almost the same quality in comparison with LSH.

    CiNii

  • Resizable-LSH : An Approximate Similarity Search Algorithm for Resizable Range-Search

    YAMAZAKI Kunihiro, NAKAMURA Tomohiro, FUNAHASHI Takuya, YAMANA Hayato

    情報処理学会研究報告. 情報学基礎研究会報告   95 ( 22 ) 1 - 8  2009.07

     View Summary

    本稿では閾値を可変にした近似的な類似検索手法を提案する.近年,距離を用いた類似検索手法の 1 つとして,Locality-Sensitive Hashing (局所性鋭敏型ハッシング,LSH) による近似的な類似検索が注目されている.LSHは,「距離が近い入力同士は高い確率で衝突する」 特徴を持つハッシュ関数を用いたデータマッピング手法であり,高次元なデータに対しても高速に近傍検索を行うことができる.しかし LSH では,事前計算によって距離が近いデータ同士を同じハッシュ値にマッピングするため,検索時に類似度の閾値を変更することができない.閾値を変更するにはハッシュテーブルの再構築が必要になるため,ユーザが閾値を指定できるような類似検索は実現困難である.そこで本研究では,類似検索時に,クエリとハッシュ値が一致するデータに加え,ハッシュ値が近いデータも取得することで,ハッシュテーブルの再構築を行うことなく,閾値を指定できる類似検索を実現した.提案手法は,閾値に合わせてハッシュテーブルを逐次再構築する LSH と比較して,同程度の精度で,かつ 1,000 倍程度の高速化を達成できることを実験により確認した.We introduce an efficient algorithm named &quot;Resizable-LSH&quot; for approximate similarity search, which enables resizing the search range flexibly. Nowadays, Locality-Sensitive Hashing (LSH) is drawing attention as an efficient algorithm for approximate nearest neighbor search. LSH adopts hash functions that collide with high probability if two vectors are close, so that LSH finds approximate nearest neighbors quickly even if the dataset is high-dimensional. However, LSH should generate hash tables preliminarily, that results in resizing the search range costs expensive because hash table regeneration is required whenever we face the needs to resize search range. To solve the problem, our proposed Resizable-LSH retrieves not only the same hash value of query, but also near hash values. Then Resizable-LSH achieves resizable range-search. As it turns out, the result of the experiments shows Resizable-LSH works about 1,000 times faster than LSH with almost the same quality in comparison with LSH.

    CiNii

  • Proposal of Word Salad Spam Detection Method using N-gram and Interrupted Collocations

    MORIMOTO Kosuke, KATASE Hiroaki, YAMANA Hayato

    情報処理学会研究報告. データベース・システム研究会報告   148 ( 24 ) 1 - 8  2009.07

     View Summary

    インターネット上にウェブページが爆発的に増加し,インターネットから得られる情報が重要になっている.しかし,ウェブページの爆発的な増加につれてスパム行為を行うページも同様に増加し,インターネットから得られる情報の価値を下げている.スパム行為には様々な手法があるが,本論文では自動的に文章を生成するワードサラダに着目し,ワードサラダ型のスパムを効率的に検出する手法を提案する.ワードサラダ型スパムを検出するため,n-gram と離散型共起表現を用いてカルバック・ライブラー情報量に基づく文章のスコアを計算し,計算したスコアに基づき判定を行う.提案手法の評価実験を行った結果,既存手法と比較して F 値で 0.18 の性能の向上を確認できた.Information on the Internet becomes important because of exploding Web page. However, Spam pages also have exploded and information from the Internet have become lower reliability. Though there are many Spamming methods, in this article we focus on &quot;word salad&quot; that creates text automatically, and we propose the effective method of word salad detection. We detect word salad by the score based on Kullback-Leibler divergence calculated with n-gram and interrupted collocation. As a result of experiment, our method improves 0.18 points in F-value from the existing method.

    CiNii

  • Reliability Verification of Search Engines' Hit Count using Multi Query

    FUNAHASHI Takuya, SONE Hiroaki, YAMANA Hayato

    IEICE technical report. Data engineering   109 ( 153 ) 19 - 24  2009.07

     View Summary

    A number of studies have been using Search Engines&#039; hit count. The goal of these studies is to build applications for translation support or natural language processing support. These studies assume that the hit count is reliable. To identify the reliability of Search Engines&#039; hit count, we have challenged to verify. In the past, we verified hit count only using one keyword query. The contribution of this paper is to verify hit count using multi query keyword.

    CiNii J-GLOBAL

  • Efficient duplicated URL detection for web crawlers

    久保田 展行, 上田 高徳, 山名 早人

    DBSJ journal   8 ( 1 ) 83 - 88  2009.06

    CiNii J-GLOBAL

  • Extending ALT algorithm to use multiple landmarks

    MATSUNAGA TAKU, HIRATE YU, YAMANA HAYATO

    IPSJ SIG Notes   2009 ( 9 ) 75 - 80  2009.01

     View Summary

    Recently, the ALT algorithm is proposed as a speed-up algorithm to compute shortest paths in general graph structures. The ALT algorithm offers a landmark based heuristic function to estimate distance in A* search Before computing shortest paths, the ALT algorithm computes distances between all nodes and landmarks, and stores them to prepared memory or storage space. However, as the number of landmarks increases, the required prepared space increases linearly. To solve this problem, in this paper, we propose a novel heuristic function for computing shortest paths in general graph structures...

    CiNii J-GLOBAL

  • Exploiting idle CPU cores to improve file access performance

    Takanori Ueda, Yu Hirate, Hayato Yamana

    Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC'09   CD-ROM   529 - 535  2009

     View Summary

    Many-core CPUs require many parallel computation tasks to reach their full potential because CPU cores become idle if they do not have enough computation tasks. How best to utilize a number of cores in many-core CPUs should be examined. In this paper, we propose exploitation of idle cores for improving file access performance. Idle cores are used to extract file access patterns from access logs and the extracted patterns are used to improve file cache efficiency by reordering the LRU (Least Recently Used) list based on the extracted patterns. Data mining techniques are used to extract access patterns to reduce computation overhead. Our method was evaluated by simulation and also implemented on Linux kernel 2.6.26 as a prototype system. In the simulation experiment, our method improved the cache-hit ratio up to 1.09% on DBT-2 (TPC-C) trace logs. Our prototype implementation on Linux improves DBT-2 performance up to 5.24% on a real machine. Copyright 2009 ACM.

    DOI

  • 商用検索エンジンにランキングされたサイトのランク変動パターンの解析

    吉田泰明, 平手勇宇, 山名早人

    DEIM2009    2009

  • 検索ヒット数のクラスタリングを用いた補正手法の検討

    舟橋 卓也, 平手 勇宇, 山名 早人

    DEIM2009    2009

  • 核となるアイテムセットによる頻出アイテムセット抽出数削減手法

    松崎勝彦, 平手勇宇, 山名早人

    DEIM2009    2009

  • 印象語からの概念推定システム

    永井洋平, 黒木さやか, 山名 早人

    信学技報(Webインテリジェンスとインタラクション研究会)    2009

  • Webページ間の関連性の伝播を用いたWebコミュニティ抽出手法の評価

    飯村卓也, 平手勇宇, 山名早人

    DEIM2009    2009

  • 複数キーワードクエリに対する検索ヒット数の信頼性検証

    舟橋卓也, 曽根広哲, 山名早人

    信学技報   Vol.109 ( No.153 ) 19 - 24  2009

  • ブログにおける話題語の出現理由の抽出と話題に関する詳細記事推薦

    中島泰, 黒木さやか, 櫻井宏樹, 山名早人

    第15回Webインテリジェンスとインタラクション研究会    2009

  • Proposal of Word Salad Spam Detection Method using N-gram and Interrupted Collocations

    森本浩介, 片瀬弘晶, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM)   Vol.2009-DBS-148 ( No.24 ) 1 - 8  2009

    J-GLOBAL

  • Prediction of GPCR ligands by 2-way prediction method

    百石弘澄, 杉原稔, 諏訪牧子, 諏訪牧子, 加藤毅, 加藤毅, 山名早人, 藤渕航, 藤渕航

    情報処理学会研究報告(CD-ROM)   Vol.2009-BIO-18 ( No.2 ) 1 - 8  2009

    J-GLOBAL

  • ウィキペディア記事閲覧回数の特徴分析

    曽根広哲, 山名早人

    Wikimedia Conference Japan 2009   SIG-SWO-A901-03  2009

  • QueueLinker: Distributed Producer/Consumer Queue Framework"

    上田 高徳, 片瀬 弘晶, 森本 浩介, 打田 研二, 山名早人

    WebDB Forum2009    2009

  • A Lock-free GCLOCK Page Replacement Algorithm

    油井誠, 油井誠, 宮崎純, 植村俊亮, 加藤博一, 山名早人

    情報処理学会論文誌トランザクション(CD-ROM)   Vol.2 ( No.4 ) 32 - 48  2009

    J-GLOBAL

  • QueueLinker: Distributed Producer/Consumer Queue Framework"

    上田 高徳, 片瀬 弘晶, 森本 浩介, 打田 研二, 山名早人

    WebDB Forum2009    2009

  • Resizable-LSH: An Approximate Similarity Search Algorithm for Resizable Range-Search

    山崎邦弘, 中村智浩, 舟橋卓也, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM)   2009 ( 2 )  2009

    J-GLOBAL

  • A Scalable Monitoring System for Distributed Environments

    Sayaka Akioka, Junichi Ikeda, Takanori Ueda, Yuki Ohno, Midori Sugaya, Yu Hirate, Jiro Katto, Shigeki Goto, Yoichi Muraoka, Hayato Yamana, Tatsuo Nakajima

    FIRST INTERNATIONAL WORKSHOP ON SOFTWARE TECHNOLOGIES FOR FUTURE DEPENDABLE DISTRIBUTED SYSTEMS, PROCEEDINGS     32 - +  2009

     View Summary

    The total amount of information to process or analyze is jumping sharply with the quick spread of computers and networks. Our project, «Highly scalable monitoring architecture for information explosion», develops a monitoring system allows observing systems, merging the system logs, and discovering intelligence to share. More concretely, the project builds the total system to maintain, optimize, and protect autonomically. This paper reports the outcomes of the project after first-half of the development period.The rest of the paper is organized as follows. Section 2 describes the concept and details of the monitoring system on a single node, and Section 3 addresses the aggregation of the collected information in distributed environments. Section 4 and Section 5 introduce applications of the monitoring systems. Section 6 summarizes the project and mentions future plans. © 2009 IEEE.

    DOI

  • Profiling Node Conditions of Distributed System with Sequential Pattern Mining

    Yu Hirate, Hayato Yamana

    FIRST INTERNATIONAL WORKSHOP ON SOFTWARE TECHNOLOGIES FOR FUTURE DEPENDABLE DISTRIBUTED SYSTEMS, PROCEEDINGS     43 - +  2009

     View Summary

    Recently, with wide-spread of distributed systems, distributed monitoring systems are needed to mange such systems. However, since monitoring architecture of distributed system faces a huge amount of log data which come from local computing nodes, information aggregation is fundamental scheme for monitoring distributed system. In this paper, we preset a novel approach for extracting computing node-condition profiles by using sequential pattern mining, which is one of data mining techniques. Extracted computing node condition profiles represent node condition patterns which are occurred in many computing nodes frequently. Thus, extracted profiles enable summarized distributed system conditions to be small sized and easy-understandable information.

    DOI

  • The Challenge of Eliminating Storage Bottlenecks in Distributed Systems

    Takanori Ueda, Yu Hirate, Hayato Yamana

    FIRST INTERNATIONAL WORKSHOP ON SOFTWARE TECHNOLOGIES FOR FUTURE DEPENDABLE DISTRIBUTED SYSTEMS, PROCEEDINGS     49 - 53  2009

     View Summary

    One of the most difficult problems in distributed systems is load-balancing. Even if we take care of load-balancing, heavily-loaded nodes often occur while there are still lightly-loaded nodes that have idle memory and idle CPU power. Our idea is to exploit this idle memory and idle CPU power to improve the storage performance of heavily-loaded nodes. Idle memory can be used for caching file data and idle CPU power can be used for extracting file access patterns from file access logs. File access patterns are valuable sources for optimizing a cache strategy. Our project goal is to improve the overall performance of distributed systems by improving storage access performance. This paper gives an overview of this project idea and reports the current status of the project. In addition, we show benchmark results from our prototype cache extension system, which is implemented in Linux Kernel 2.6. The DBT-3 (TPC-H) benchmark results show that our system can increase computer speed by a factor of 6.68.

    DOI

  • Implementing and Evaluating Graph Engine for Large Scale Graphs

    MATSUNAGA Taku, KATASE Hiroaki, UEDA Takanori, KUBOTA Nobuyuki, MORIMOTO Kosuke, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   108 ( 329 ) 43 - 43  2008.11

    CiNii

  • Search Engines' Trustworthiness(<Special Issue>Trust Assessment of Web Information)

    Yamana Hayato

    Journal of Japanese Society for Artificial Intelligence   23 ( 6 ) 752 - 759  2008.11

    CiNii J-GLOBAL

  • Improvement in speed and accuracy of multiple sequence alignment program prime

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    IPSJ Transactions on Bioinformatics   1   2 - 12  2008.11

     View Summary

    Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. We have developed an MSA program PRIME, whose crucial feature is the use of a group-to-group sequence alignment algorithm with a piecewise linear gap cost. We have shown that PRIME is one of the most accurate MSA programs currently available. However, PRIME is slower than other leading MSA programs. To improve computational performance, we newly incorporate anchoring and grouping heuristics into PRIME. An anchoring method is to locate well-conserved regions in a given MSA as anchor points to reduce the region of DP matrix to be examined, while a grouping method detects conserved subfamily alignments specified by phylogenetic tree in a given MSA to reduce the number of iterative refinement steps. The results of BAliBASE 3.0 and PREFAB 4 benchmark tests indicated that these heuristics contributed to reduction in the computational time of PRIME by more than 60% while the average alignment accuracy measures decreased by at most 2%. Additionally, we evaluated the effectiveness of iterative refinement algorithm based on maximal expected accuracy (MEA). Our experiments revealed that when many sequences are aligned, the MEA-based algorithm significantly improves alignment accuracy compared with the standard version of PRIME at the expense of a considerable increase in computation time. © 2008 Information Processing Society of Japan.

    DOI CiNii

  • Dynamic I/O Optimization with Access Pattern Mining at OS Level

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2008 ( 88 ) 73 - 78  2008.09

     View Summary

    Many-core CPU improves parallel performance but also raises problem of storage performance bottleneck. I/O optimization should be taken at operating system level because various applications are executed in parallel on many-core CPU environment and I/O optimization requires cross-cutting knowledge about applications. We propose a new method which uses disk access patterns for improving efficiency of disk cache replacement algorithm. Our method is now implemented at Linux 2.6.26 and extracts access patterns from file access logs of applications. The experimental results show our method impro...

    CiNii J-GLOBAL

  • Web Community Extraction Method with Web Pages' Relevance Fowarding

    IIMURA Takuya, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2008 ( 88 ) 133 - 138  2008.09

     View Summary

    To find information from a large collection of Web-pages, several methods for extracting Web communities are proposed. In the past studies, it succeeds in improving precision score by making a rule whether or not to include a certain Web page into a Web community strictly. However, recall score might worsen because the Web page that should be included in the Web community is not included. In this paper, we propose the Web community extraction method that can improve recall score without decreasing precision score. The method adds Web pages that have many links from/to the Web pages in a sam...

    CiNii J-GLOBAL

  • Reliability Verification of Search Engines' Hit Count

    FUNAHASHI Takuya, UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2008 ( 88 ) 139 - 144  2008.09

     View Summary

    A number of studies have been using Search Engines&#039; hit count. The goal of these studies is to build applications for translation support or natural language processing support. These studies assume that the hit count is reliable. However, none of the studies have been verifide the reliability of Search Engines&#039; hit count. If the hit count is unreliable, studies using hit count become also unreliable. The purpose of this paper is to verify the reliability of Search Engines&#039; hit count. In this experiment, we used Search APIs provided by Google, Yahoo! Japan and Live Search. Furthermore, we r...

    CiNii J-GLOBAL

  • Analyzing geographical location and number of back-links of web servers all over the world

    平手 勇宇, 片瀬 弘晶, 山名 早人

    Journal of the DBSJ   Vol.7 ( No.2 ) 1 - 6  2008.09

    CiNii J-GLOBAL

  • Message from the MAW 2008 co-chairs

    Takahiro Hara, Yanchim Zhang, William K. Cheung, Shengrui Wang, Hayato Yamana, Km Fun Li, Laurence T. Yang

    Proceedings - International Conference on Advanced Information Networking and Applications, AINA     57  2008.09

    DOI

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. DE, データ工学   Vol.108 ( No.93 ) 89 - 94  2008.06

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines' results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using search engines' APIs. Among all articles in the Japanese version of Wikipedia, about 90% of articles were ranked in top 10 by Yahoo! JAPAN and Google, and about 70% were ranked in top 10 by MSN. In the case of Yahoo! JAPAN and MSN, newly created articles in Wikipedia tend to appear in top 10 ranking at first, and keep their high rankings. The influence of Wikipedia toward search engine rankings was confirmed by our experiments.

    CiNii J-GLOBAL

  • OS Level I/O Optimization in the Many-Core Era

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   Vol.2008 ( No.56 ) 133 - 133  2008.06

     View Summary

    近い将来,1つのチップに十数コアを搭載したメニーコア CPU が登場することは確実である.メニーコア環境下では,多くのアプリケーションが並列に動作するため,HDD が特に不得手とするランダムアクセスの頻度が増え,ストレージがますますボトルネックになると考えられる.そこで我々は,ストレージのボトルネックをソフトウェア的に軽減することを考えている.具体的には,アプリケーションのアクセスパターンを活用するディスクキャッシュ機構を Linux に実装し,実システムで評価することをひとつの目標にしている.ワークショップでは,これまでの研究概要と既存研究について述べると共に,最新の研究成果について述べ,今後の研究指針を示す.Many-core CPU which consists of some dozen cores in one package will definitely appear in the near future. In many-core environments, storage system will become bottlenecks since the random access to storage will increase because many applications will run in parallel. To meet this problem, we try to ease the storage bottlenecks by software method. Specifically, we try to implement a novel disk cache technique which exploits file access patterns of applications. The cache technique will be implemented on Linux Kernel and evaluated in real system. In this talk, we will show our research abstract and related works, and then show the latest results and the milestone of our research.

    CiNii

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. DE, データ工学   Vol.7 ( No.2 ) 7 - 12  2008.06

    CiNii J-GLOBAL

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解   108 ( 94 ) 89 - 94  2008.06

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines' results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using search engines' APIs. Among all articles in the Japanese version of Wikipedia, about 90% of articles were ranked in top 10 by Yahoo! JAPAN and Google, and about 70% were ranked in top 10 by MSN. In the case of Yahoo! JAPAN and MSN, newly created articles in Wikipedia tend to appear in top 10 ranking at first, and keep their high rankings. The influence of Wikipedia toward search engine rankings was confirmed by our experiments.

    CiNii

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解   108 ( 94 ) 59 - 64  2008.06

     View Summary

    Clustering of internet news articles makes it possible to detect various useful information, for example, related articles, and latest topic words. From the TDT project down, this area is widely researched. Conventional clustering methods have difficulties to detect single article as a single cluster even though many single articles exists. In this paper, we propose a method to cluster news articles that exclude single articles in advance by using proper noun information, topographic information and other characteristics between single and non-single articles. In evaluation, we use half a year Japanese news articles. Compared to the Single-Link Method, which alone is difficult to judge articles single, our proposing method improves precision 10.2% and reduces the computation time to approximately a third.

    CiNii

  • Geographical Location and Number of Back-Links of Web Servers All Over the World

    HIRATE Yu, KATASE Hiroaki, YAMANA Hayato

    IPSJ SIG Notes   2008 ( 56 ) 25 - 32  2008.06

     View Summary

    According to our investigation result in Oct. 2005, the number of Web pages all over the world is estimated 53.7 billion. We have investigated TLD distribution and Language Distribution of Web pages based on 10.7 billion Web page dataset. In this paper, as one of our Web statics investigation series, we analyzed three kinds of distribution based on 10.7 billion Web page dataset, distribution of geographical location of Web server, the number of virtual hosts per one Web server, and the number of back links, i.e. the value of in-degree, per one Web server. Our results show (1) about 95.5% of Web servers are located in North America, Europe, and Asia regions, (2) hosts located in Latain America and East Europe have a large number of virtual hosts, and (3) the distribution between the value of in-degree and the number of Web servers follow the power low.

    CiNii J-GLOBAL

  • Measuring Editor's Trustworthiness in Wikipedia by Using Edit History without Depending on the Edit Frequency

    SAKURAI Hiroki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. DE, データ工学   108 ( 93 ) 115 - 120  2008.06

     View Summary

    Free encyclopedia Wikipedia on the Internet contains 490,000 Japanese articles and its volume of information is huge and useful. However, the trustworthiness of the articles' contents becomes uncertain because anyone can edit them easily. In other words, since we cannot understand exactly who edits the contents, it results in difficulty of measuring trustworthiness of the articles' contents. To such a problem the method using the edit history is proposed for measuring the trustworthiness of the articles. However, it is inapposite compared with the article and the editor with a little frequency to be edited. Therefore, in this paper, we propose the method for measuring editors' trustworthiness without depending on the edit frequency. The proposed method is based on the ratio where the edit remains the latest version of contents. Our evaluation shows that our proposed method evaluate the editor with high reliability highly, and the editor with low reliability lowly without depending on the edit frequency.

    CiNii J-GLOBAL

  • Geographical Location and Number of Back-Links of Web Servers All Over the World

    HIRATE Yu, KATASE Hiroaki, YAMANA Hayato

    IPSJ SIG Notes   2008 ( 56 ) 25 - 32  2008.06

     View Summary

    According to our investigation result in Oct. 2005, the number of Web pages all over the world is estimated 53.7 billion. We have investigated TLD distribution and Language Distribution of Web pages based on 10.7 billion Web page dataset. In this paper, as one of our Web statics investigation series, we analyzed three kinds of distribution based on 10.7 billion Web page dataset, distribution of geographical location of Web server, the number of virtual hosts per one Web server, and the number of back links, i.e. the value of in-degree, per one Web server. Our results show (1) about 95.5% of...

    CiNii

  • OS Level I/O Optimization in the Many-Core Era

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2008 ( 56 ) 133 - 133  2008.06

     View Summary

    近い将来,1つのチップに十数コアを搭載したメニーコア CPU が登場することは確実である.メニーコア環境下では,多くのアプリケーションが並列に動作するため,HDD が特に不得手とするランダムアクセスの頻度が増え,ストレージがますますボトルネックになると考えられる.そこで我々は,ストレージのボトルネックをソフトウェア的に軽減することを考えている.具体的には,アプリケーションのアクセスパターンを活用するディスクキャッシュ機構を Linux に実装し,実システムで評価することをひとつの目標にしている.ワークショップでは,これまでの研究概要と既存研究について述べると共に,最新の研究成果について述べ,今後の研究指針を示す.Many-core CPU which consists of some dozen cores in one package will definitely appear in the near future. In many-core environments, storage system will become bottlenecks since the random access to storage will increase because many applications will run in parallel. To meet this problem, we try to ease the storage bottlenecks by software method. Specifically, we try to implement a novel disk cache technique which exploits file access patterns of applications. The cache technique will be implemented on Linux Kernel and evaluated in real system. In this talk, we will show our research abstract and related works, and then show the latest results and the milestone of our research.

    CiNii J-GLOBAL

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   108 ( 93 ) 59 - 64  2008.06

     View Summary

    Clustering of internet news articles makes it possible to detect various useful information, for example, related articles, and latest topic words. From the TDT project down, this area is widely researched. Conventional clustering methods have difficulties to detect single article as a single cluster even though many single articles exists. In this paper, we propose a method to cluster news articles that exclude single articles in advance by using proper noun information, topographic information and other characteristics between single and non-single articles. In evaluation, we use half a y...

    CiNii

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   108 ( 93 ) 89 - 94  2008.06

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines&#039; results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using sea...

    CiNii

  • Gathering Over 10 Billion of Web Pages and its Applications

    YAMANA Hayato

    IEICE technical report. Data engineering   108 ( 93 ) 95 - 95  2008.06

     View Summary

    The number of Web pages distributed from Web servers is estimated about 53.7 billion as of Oct. 2005. We had gathered 14,456,201,906 Web pages from 5,548 Web servers during Jan. 2004 to July 2006. It had been conducted as part of e-Society project which is one of MEXT, Ministry of Education, Culture, Sports, Science and Technology, leading projects. Speedup of crawling Web pages conflicts with Web-site friendly crawling, however, both are indispensable for gathering Web pages. In the project, we have studied and proposed a dynamic delay adjustment scheme for accessing Web servers to prevent...

    CiNii

  • Measuring Editor's Trustworthiness in Wikipedia by Using Edit History without Depending on the Edit Frequency

    SAKURAI Hiroki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   108 ( 93 ) 115 - 120  2008.06

     View Summary

    Free encyclopedia Wikipedia on the Internet contains 490,000 Japanese articles and its volume of information is huge and useful. However, the trustworthiness of the articles&#039; contents becomes uncertain because anyone can edit them easily. In other words, since we cannot understand exactly who edits the contents, it results in difficulty of measuring trustworthiness of the articles&#039; contents. To such a problem the method using the edit history is proposed for measuring the trustworthiness of the articles. However, it is inapposite compared with the article and the editor with a little freque...

    CiNii

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    Technical report of IEICE. PRMU   108 ( 94 ) 59 - 64  2008.06

     View Summary

    Clustering of internet news articles makes it possible to detect various useful information, for example, related articles, and latest topic words. From the TDT project down, this area is widely researched. Conventional clustering methods have difficulties to detect single article as a single cluster even though many single articles exists. In this paper, we propose a method to cluster news articles that exclude single articles in advance by using proper noun information, topographic information and other characteristics between single and non-single articles. In evaluation, we use half a y...

    CiNii

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    Technical report of IEICE. PRMU   108 ( 94 ) 89 - 94  2008.06

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines&#039; results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using sea...

    CiNii

  • Gathering Over 10 Billion of Web Pages and its Applications

    YAMANA Hayato

    Technical report of IEICE. PRMU   108 ( 94 ) 95 - 95  2008.06

    CiNii

  • Measuring Editor's Trustworthiness in Wikipedia by Using Edit History without Depending on the Edit Frequency

    SAKURAI Hiroki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    Technical report of IEICE. PRMU   108 ( 94 ) 115 - 120  2008.06

     View Summary

    Free encyclopedia Wikipedia on the Internet contains 490,000 Japanese articles and its volume of information is huge and useful. However, the trustworthiness of the articles&#039; contents becomes uncertain because anyone can edit them easily. In other words, since we cannot understand exactly who edits the contents, it results in difficulty of measuring trustworthiness of the articles&#039; contents. To such a problem the method using the edit history is proposed for measuring the trustworthiness of the articles. However, it is inapposite compared with the article and the editor with a little freque...

    CiNii

  • 3ZK-10 A System for Finding Shortest Paths Between Web Pages

    Matsunaga Taku, Hirate Yu, Yamana Hayato

    全国大会講演論文集   70 ( 5 ) "5 - 193"-"5-194"  2008.03

     View Summary

    According to our investigation result in Oct. 2005, the number of Web pages all over the world is estimated 53.7 billion. We have investigated TLD distribution and Language Distribution of Web pages based on 10.7 billion Web page dataset. In this paper, as one of our Web statics investigation series, we analyzed three kinds of distribution based on 10.7 billion Web page dataset, distribution of geographical location of Web server, the number of virtual hosts per one Web server, and the number of back links, i.e. the value of in-degree, per one Web server. Our results show (1) about 95.5% of...

    CiNii

  • 5L-1 全世界のWebページのTLD・言語分布解析(リーディングプロジェクト e-society:WebアーカイブとWebデータ解析技術,一般セッション,リーディングプロジェクト e-society)

    平手 勇宇, 山名 早人

    全国大会講演論文集   70 ( 5 ) "5 - 361"-"5-362"  2008.03

    CiNii

  • EReM-DiCE: Exploiting Remote Memory for Disk Cache Extension

    Takanori UEDA, Yu HIRATE, Hayato YAMANA

    Proc. of 1st International Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability (SPEED2008)    2008

  • 分散メタP2Pストレージ「DiMPS」によるコンテンツ配信システムの実現

    岡本雄太, 山名早人

    DEWS2008    2008

  • 評判情報における評価対象の性質や一部分を表す表現の高精度な抽出手法

    臼渕護, 平手勇宇, 山名早人

    言語処理学会第14回年次大会(NLP2008)    2008

  • 全世界のWebページのTLD・言語分布解析

    平手勇宇, 山名早人

    第70回情処全大   5L-1 ( 5 )  2008

    J-GLOBAL

  • 全世界のWebサイトの言語分布と日本語を含むWebサイトのリンク・地理的位置の解析

    童 芳, 平手勇宇, 山名早人

    DEWS2008    2008

  • 商用検索エンジンの検索結果では取得できないランキング下位部分の収集・解析

    舟橋卓也, 上田高徳, 平手勇宇, 山名早人

    DEWS2008    2008

  • 検索エンジンを用いた類似文章検索システムEPCI の評価

    田代崇, 上田高徳, 平手勇宇, 山名早人

    DEWS2008    2008

  • リンク構造解析アルゴリズム高速化のための縮小Webリンク構造の構築

    片瀬弘晶, 松永拓, 上田高徳, 田代崇, 平手勇宇, 山名早人

    DEWS2008    2008

  • プログラムコードの抽象化を利用した類似ソースコード検索システム

    黒木さやか, 上田高徳, 平手勇宇, 山名早人

    DEWS2008    2008

  • システムコールレベルのアクセスログによるディスクアクセスパターンマイニングの検討

    上田高徳, 平手 勇宇, 山名 早人

    DEWS2008    2008

  • Webページ間最短経路探索システムの構築

    松永 拓, 平手勇宇, 山名早人

    第70回情処全大   3ZK-10 ( 5 )  2008

    J-GLOBAL

  • Webページ間最短経路サブグラフによるオンラインリンクマイニング

    松永 拓, 平手勇宇, 山名 早人

    DEWS2008    2008

  • Y.Hirate(D3), A.Aiyoshizawa, S.O, Y.Ioku, F.Kido and H.Yamana

    System for Detecting Auction Fraud Communities in Internet Auctions

    Proc. of the 2nd International Conf. on Information Systems, Technology and Management(ICISTM-08)    2008

  • What's going on in search engine rankings?

    Yasuaki Yoshida, Takanori Ueda, Takashi Tashiro, Yu Hirate, Hayato Yamana

    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3     1199 - 1204  2008  [Refereed]

     View Summary

    Many people use search engines every day to retrieve documents from the Web. Although the social influence of search engine rankings has become significant, ranking algorithms are not disclosed. In this paper we have investigated three major search engine rankings by analyzing two kinds of data. One is the weekly ranking snapshots of top 250. Web pages we collected for almost one year by submitting 1,000 pre-selected queries; the other comprises back-linked Web pages gathered by our own Web crawling. As a result, we have confirmed that (1) several top 10 rankings are mutually similar however the following ranked Web pages are almost different, (2) ranking transitions have their own characteristics, and (3) each search engine's ranking has its own correlation with the number of back-linked Web pages.

    DOI

  • 全世界のWebホストの地理的位置・バックリンク数の解析

    平手勇宇, 片瀬弘晶, 山名早人

    情報研報(DBS)   Vol.2008 ( No.56 ) 25 - 32  2008

  • 全世界のWebサイトのTLD・言語分布・地理的設置位置の特定

    童芳, 平手勇宇, 山名早人

    日本データベース学会論文   Vol.7 ( No.1 ) 31 - 36  2008

    J-GLOBAL

  • 商用検索エンジンの検索結果では取得できないランキング下位部分の収集・解析

    舟橋卓也, 上田高徳, 平手勇宇, 山名早人

    日本データベース学会論文誌   Vol.7 ( No.1 ) 37 - 42  2008

    J-GLOBAL

  • 商用検索エンジンのヒット数に対する信頼性の検証

    舟橋卓也, 上田高徳, 平手勇宇, 山名早人

    情処研報(DBS)/iDB2008   Vol.2008 ( No.88 ) 139 - 144  2008

  • リンク構造解析アルゴリズム高速化のための縮小Webの構築

    片瀬弘晶, 松永拓, 上田高徳, 田代崇, 平手勇宇, 山名早人

    日本データベース学会論文誌, Vol.7   Vol.7 ( No.1 ) 245 - 250  2008

    J-GLOBAL

  • システムコールレベルのアクセスログを用いたディスクアクセスパターンマイニング

    上田高徳, 平手勇宇, 山名早人

    日本データベース学会論文誌   Vol.7 ( No.1 ) 145 - 150  2008

    J-GLOBAL

  • アクセスパターンマイニングによるOSレベルでの動的なI/O最適化

    上田高徳, 平手勇宇, 山名早人

    情処研報(DBS)/iDB2008   Vol.2008 ( No.88 ) 73 - 78  2008

  • Webページ間の関連性の伝播を用いたWebコミュニティ抽出手法

    飯村卓也, 平手勇宇, 山名早人

    情処研報(DBS)/iDB2008   Vol.2008 ( No.88 ) 133 - 138  2008

  • 100億規模のWebページ収集とその活用

    山名早人

    信学技報(データ工学研究会)   Vol.108 ( No.93 ) 95  2008

  • Toward the Analysis of over 10 billion Web pages

    Hayato YAMANA

    Proc. of the 4th Korea-Japan Int'l Database Workshop 2008(KJDB 2008)     239 - 255  2008

  • 大規模Webリンクデータを用いた リンクスパムコミュニティ抽出

    平手勇宇, 山名早人

    楽天研究開発シンポジウム2008    2008

  • 検索エンジンの信頼性

    山名早人

    人工知能学会誌   Vol.23 ( No.6 ) 752 - 759  2008

  • 100億規模のWebページ収集・分析への挑戦

    村岡洋一, 山名早人, 松井くにお, 橋本三奈子, 赤羽匡子, 萩原純一

    情報処理   Vol.49 ( No.11 ) 1277 - 1283  2008

  • 商用検索エンジンのヒット数に対する信頼性の検証

    舟橋卓也, 上田高徳, 平手勇宇, 山名早人

    日本データベース学会論文誌   Vol.7 ( No.3 ) 31 - 36  2008

    J-GLOBAL

  • グラフデータ処理エンジンの実装と評価

    松永拓, 片瀬弘晶, 上田高徳, 久保田展行, 森本浩介, 平手勇宇, 山名早人

    信学技報   Vol.108 ( No. 329 ) 43 - 43  2008

  • EReM-DiCE: Exploiting Remote Memory for Disk Cache Extension

    Takanori UEDA, Yu HIRATE, Hayato YAMANA

    Proc. of 1st International Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability (SPEED2008)    2008

  • Y.Hirate(D3), A.Aiyoshizawa, S.O, Y.Ioku, F.Kido and H.Yamana

    System for Detecting Auction Fraud Communities in Internet Auctions

    Proc. of the 2nd International Conf. on Information Systems, Technology and Management(ICISTM-08)    2008

  • Toward the Analysis of over 10 billion Web pages

    Hayato YAMANA

    Proc. of the 4th Korea-Japan Int'l Database Workshop 2008(KJDB 2008)     239 - 255  2008

  • 大規模テキストからの複合語の属性表現の抽出手法

    臼渕護, 平手勇宇, 山名早人

    言語処理学会年次大会発表論文集   14th  2008

    J-GLOBAL

  • Web structure in 2005

    Yu Hirate, Shin Kato, Hayato Yamana

    ALGORITHMS AND MODELS FOR THE WEB-GRAPH   4936   36 - 46  2008

     View Summary

    The estimated number of static web pages in Oct 2005 was over 20.3 billion, which was determined by multiplying the average number of pages per web server based on the results of three previous studies, 200 pages, by the estimated number of web servers on the Internet, 101.4 million. However, based on the analysis of 8.5 billion web pages that we crawled by Oct. 2005, we estimate the total number of web pages to be 53.7 billion. This is because the number of dynamic web pages has increased rapidly in recent years. We also analyzed the web structure using 3 billion of the 8.5 billion web pages that we have crawled. Our results indicate that the size of the "CORE," the central component of the bow tie structure, has increased in recent years, especially in the Chinese and Japanese web.

    DOI

  • Optimistic transactional active replication

    Hiroshi Horii, Hayato Yamana

    Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, ICUIMC-2008     94 - 100  2008

     View Summary

    Critical database applications require 2-safe replication between at least two sites for disaster-tolerant services. At the same time, they must provide consistent and low-latency results to their clients in normal cases. In this paper, we propose Optimistic Transactional Active Replication (OTAR), which replicates the transaction logs with low latency and provides a consistent view to database applications. The latency of our replication is lower than Passive Replication, and guarantees the serializability of transaction isolation levels that cannot be supported by Active Replication. For our replication, each client sends a transaction request to all replicas and all of the replicas execute the request and optimistically return the result of the transaction to the client. Each replica generates a causality history of the transaction, sent to the client with the result. With the causality histories, the client can make sure that the requested transaction was executed in the same order at all of the replicas and eventually commit it. If the client cannot validate the order, then the client waits for the pessimistic result of the transaction from the replicas. This paper describes the algorithm and its properties. © 2008 ACM.

    DOI

  • Improvement in speed and accuracy of multiple sequence alignment program PRIME

    Yamada Shinsuke, Gotoh Osamu, Yamana Hayato

    IPSJ SIG technical reports   2007 ( 128 ) 267 - 274  2007.12

     View Summary

    Multiple sequence alignments (MSAs) are useful tools in bioinformatics, and many MSA algorithms have been developed. We have developed an MSA program PRIME, which is one of the most accurate programs. However, PRIME is slower than other leading MSA programs. Therefore, we newly incorporate heuristics into PRIME. The benchmark results indicated that these heuristics contributed to significant reduction in the computational time with the slight accuracy decrease. Additionally, we evaluated the effectiveness of an algorithm based on maximal expected accuracy (MEA). Our experiments revealed tha...

    CiNii

  • Improvement in speed and accuracy of multiple sequence alignment program PRIME

    Yamada Shinsuke, Gotoh Osamu, Yamana Hayato

    IPSJ SIG Notes   2007 ( 128 ) 267 - 274  2007.12

     View Summary

    Multiple sequence alignments (MSAs) are useful tools in bioinformatics, and many MSA algorithms have been developed. We have developed an MSA program PRIME, which is one of the most accurate programs. However, PRIME is slower than other leading MSA programs. Therefore, we newly incorporate heuristics into PRIME. The benchmark results indicated that these heuristics contributed to significant reduction in the computational time with the slight accuracy decrease. Additionally, we evaluated the effectiveness of an algorithm based on maximal expected accuracy (MEA). Our experiments revealed tha...

    CiNii

  • Automatic Non-Photorealistic Rendering Based on Adding Freehand Borderlines to Photographs

    SAKAMOTO Yuki, YAMANA Hayato

    IPSJ SIG Notes   2007 ( 84 ) 1 - 6  2007.08

     View Summary

    This paper proposes a new method for the automatic non-photorealistic rendering based on adding freehand borderlines to photographs. The proposed method enables various borderline expressions by extracting borderlines from an input image, and pouring various line patterns which are drawn with a pencil, a pen, a brush, and a crayon. That results in automatic generation of the picture which user hopes for. In order to extract natural borderlines from a picture, we propose a new method to extract a series of borderline, because conventional borderline extraction methods have problems such as d...

    CiNii J-GLOBAL

  • Exploiting Remote Memory to Speed-up Random Disk Access

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2007 ( 79 ) 151 - 156  2007.08

     View Summary

    As hard disks tend to be bottleneck devices in the current computer architecture, enhancing hard disk access speed is one of the most efficient factors to improve the total performance of computers. However, hardware modification with or without software modification is costly. Accordingly, we propose a new acceleration method implemented at the OS kernel level, which has no requirement for modification of hardware or existing applications. Specifically, our method exploits remote memory to extend local disk cache. We have implemented our method on Linux Kernel 2.6. The PostgreSQL benchmark...

    CiNii J-GLOBAL

  • Detecting Article Errors in English using Search Engines

    HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2007 ( 65 ) 139 - 144  2007.07

     View Summary

    Recently, both the necessity for English and opportunities to write English have became higher and higher among non-native English speakers. But most of Japanese people tend to made many errors in English article usage when they write English. In this paper, we propose a method for detecting article errors in English by using search engines. Since search engines index great amounts of text data on web pages, search engine based methods are able to detect undetectable errors which conventional corpus based method cannot detect. Lapata et al. proposed a method for detecting article errors bas...

    CiNii

  • Quantitative Evaluation and Feature Analysis of Search Engine Rankings

    YOSHIDA Yasuaki, UEDA Takanori, TASHIRO Takashi, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes   2007 ( 65 ) 441 - 446  2007.07

     View Summary

    Most people use search engines in order to retrieve documents on the web. In this way, the social influence of search engines&#039; ranking is large, however, the algorithms of deciding ranking are not declared. In this paper, we have investigated three major search engines&#039; rankings with analyzing ranking data that we collected by submitting 1000 queries weekly. As a result, we have confirmed that (1) the several top tens rankings of famous search engines are similar mutually, (2) the ranking transitions are different each other, and (3) each engine&#039;s rankings have correlation with the number o...

    CiNii

  • Detecting Article Errors in English using Search Engines

    HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   107 ( 131 ) 139 - 144  2007.06

     View Summary

    Recently, both the necessity for English and opportunities to write English have became higher and higher among non-native English speakers. But most of Japanese people tend to made many errors in English article usage when they write English. In this paper, we propose a method for detecting article errors in English by using search engines. Since search engines index great amounts of text data on web pages, search engine based methods are able to detect undetectable errors which conventional corpus based method cannot detect. Lapata et al. proposed a method for detecting article errors bas...

    CiNii J-GLOBAL

  • Quantitative Evaluation and Feature Analysis of Search Engine Rankings

    YOSHIDA Yasuaki, UEDA Takanori, TASHIRO Takashi, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   107 ( 131 ) 441 - 446  2007.06

     View Summary

    Most people use search engines in order to retrieve documents on the web. In this way, the social influence of search engines&#039; ranking is large, however, the algorithms of deciding ranking are not declared. In this paper, we have investigated three major search engines&#039; rankings with analyzing ranking data that we collected by submitting 1000 queries weekly. As a result, we have confirmed that (1) the several top tens rankings of famous search engines are similar mutually, (2) the ranking transitions are different each other, and (3) each engine&#039;s rankings have correlation with the number o...

    CiNii J-GLOBAL

  • MathBox : Pen-Based Mathematical Expression Input System

    糟谷 勇児, 山名 早人

    Technical report of IEICE. PRMU   106 ( 606 ) 1 - 6  2007.03

     View Summary

    This paper presents pen-based mathematical expression input system named MathBox. On MathBox, a user inputs mathematical expressions as the iteration of writing one symbol in the &quot;box&quot;. Boxes are automatically shown along user&#039;s writing. When a user writes alphanumeric, MathBox shows the boxes for power and index. When a user writes fraction line, MathBox shows the boxes for numerator and denominator. Those interactions enable MathBox to skip the recognition of the structure of mathematical expressions. Thus a user is capable of inputting mathematical expressions in practical accuracy. The ...

    CiNii J-GLOBAL

  • Image Retrieval with Automatic Labeling with Users' Queries : Growable Search Engine by Users

    IGUCHI Shigeru, YAMANA Hayato

    Technical report of IEICE. PRMU   106 ( 606 ) 61 - 66  2007.03

     View Summary

    This paper proposes an image retrieval system whose images are labeled and search precision is improved while the system is used. Image retrieval is categorized into two major types: one type is Text-based Image Retrieval(TBIR) which uses keywords(texts) as queries; another type is Content-based Image Retrieval(CBIR) which uses images as queries. TBIR returns images related in linguistics and meaning, CBIR returns images related in look and feel. Moreover, TBIR takes extra effort for labeling images; CBIR does not need labeling, but it does not take into account effective information of key...

    CiNii J-GLOBAL

  • Disk Access Speed up Using Prefetching Thread

    FUKAYAMA Tatsunori, SUGITA Shu, HIRUTA Tomonori, YAMANA Hayato

    IPSJ SIG Notes   Vol.2007 ( No.17 ) 233 - 238  2007.03

     View Summary

    It takes four to ten times more time to get data from hard disk drive than from DRAM. In this paper, we present a speed up mechanisms using a prefetching thread on a multicore system to overcome this relative deterioration of hard disk drive performance. A Prefetching thread loads data from hard disk before main thread requires the particular data. When main thread requires the data, the data will be found on disk cache so it will take no time to get the data. We have confirmed that the prefetching thread reduces the execution time of gzip. The performance of gzip increased up to 39.2%.

    CiNii J-GLOBAL

  • A Speed-Up Method for Shell Scripts on Multi-Core and SMT Processors

    SUGITA SHU, FUKAYAMA TATSUNORI, HIRUTA TOMONORI, TOUNAKA NOBUAKI, YAMANA HAYATO

    IPSJ SIG Notes   2007 ( 17 ) 73 - 78  2007.03

     View Summary

    The purpose of this study is to show the effectiveness of shell script execution on multi-core and/or SMT (Simultaneous Multi-Threading) processors. recently, multi-core processor and SMT technique have become popular even at home and in business. However, using programs or compilers without consideration of parallelism does not give us the benefits of multi-core and multi-thread. Programmers have to do parallel programming to receive the benefits. Therefore, automatic parallelizing technique has been studied actively. This paper proposes automatic parallelizing scheme for shell script programs on multi-core and/or SMT processors. As a result of the experiment, we have confirmed that the speed-up of automatic parallelized shell script program is 1.4 to 1.8 times in comparison with the original shell script program.

    CiNii J-GLOBAL

  • A Speed-Up Method for Shell Scripts on Multi-Core and SMT Processors

    SUGITA SHU, FUKAYAMA TATSUNORI, HIRUTA TOMONORI, TOUNAKA NOBUAKI, YAMANA HAYATO

    IPSJ SIG Notes   2007 ( 17 ) 73 - 78  2007.03

     View Summary

    The purpose of this study is to show the effectiveness of shell script execution on multi-core and/or SMT (Simultaneous Multi-Threading) processors. recently, multi-core processor and SMT technique have become popular even at home and in business. However, using programs or compilers without consideration of parallelism does not give us the benefits of multi-core and multi-thread. Programmers have to do parallel programming to receive the benefits. Therefore, automatic parallelizing technique has been studied actively. This paper proposes automatic parallelizing scheme for shell script prog...

    CiNii

  • Disk Access Speed up Using Prefetching Thread

    Fukayama Tatsunori, Sugita Shu, Hiruta Tomonori, Yamana Hayato

    IPSJ SIG Notes   2007 ( 17 ) 233 - 238  2007.03

     View Summary

    It takes four to ten times more time to get data from hard disk drive than from DRAM. In this paper, we present a speed up mechanisms using a prefetching thread on a multicore system to overcome this relative deterioration of hard disk drive performance. A Prefetching thread loads data from hard disk before main thread requires the particular data. When main thread requires the data, the data will be found on disk cache so it will take no time to get the data. We have confirmed that the prefetching thread reduces the execution time of gzip. The performance of gzip increased up to 39.2%.

    CiNii

  • MathBox: Interactive Pen-Based Interface for Inputting Mathematical Expressions

    Yuji Kasuya, Hayato Yamana

    2007 INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES     274 - 277  2007

     View Summary

    Inputting mathematical expressions with a mouse and a keyboard is a troublesome task. Thus, a number of mathematical expression recognition systems capable of recognizing handwritten mathematical expressions to input them into computers have been proposed. Even with these systems, however, structure recognition of mathematical expressions is still difficult. This paper presents MathBox, a new pen-based interface for inputting mathematical expressions into computers. MathBox interactively shows "boxes" in which the user can write one symbol. The boxes are shown along with the user's writing. For example, when the user writes 'x,' the boxes for a power and an index of 'x' and for the next symbol are shown. When the user inputs a fraction line, boxes for the numerator, denominator, and the next symbol are shown. MathBox skips recognizing the structures of expressions, which enables users to write mathematical expressions with practical accuracy.

    DOI

  • 二段階の類似画像検索を用いた改変画像検出手法

    馬越健治, 糟谷 勇児, 山名 早人

    DEWS2007     L1-3  2007

  • 経済時系列データからの投資指標の抽出

    柳井佳孝, 山名早人

    DEWS2007     E9-4  2007

  • ネットワーク上のマシンをディスクキャッシュに利用した場合の性能評価

    上田高徳, 平手勇宇, 山名早人

    DEWS2007     E7-9  2007

  • キーワードの出現に基づくブログコミュニティ抽出とオピニオンリーダーの発見

    永拓, 平手勇宇, 山名早人

    DEWS2007     C3-7  2007

  • Web検索エンジンのランキングバイアスに関する研究動向

    平手勇宇, 吉田泰明, 山名早人

    DEWS2007     C7-7  2007

  • 手書き数式入力システムMathBox

    糟谷勇児, 山名早人

    信学技報PRMU    2007

  • ユーザクエリによる画像へのキーワード付けを利用した画像検索 ~ 利用によって賢くなる検索エンジン ~

    井口 茂, 山名早人

    信学技報PRMU    2007

  • マルチコアプロセッサ上におけるシェルスクリプト高速化手法

    杉田秀, 深山辰徳, 蛭田智則, 當仲寛哲, 山名早人

    情処研報, Hokke2007   Vol.2007 ( No.17 ) 73 - 78  2007

  • タンパク質立体構造に基づいたアラインメント中の保存領域抽出手法の改良

    山田真介, 山名早人, 野口保

    第7回日本蛋白質科学会年会   7th  2007

    J-GLOBAL

  • EPCI: Extracting potentially copyright infringement texts from the web

    Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu Hirate, Hayato Yamana

    16th International World Wide Web Conference, WWW2007   pp.1151-1152   1151 - 1152  2007

     View Summary

    In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set of queries based on a given copyright reserved seed-text, (2) putting every query to search engine API, (3) gathering the search result Web pages from high ranking until the similarity between the given seed-text and the search result pages becomes less than a given threshold value, and (4) merging all the gathered pages, then re-ranking them in the order of their similarity. Our experimental result using 40 seed-texts shows that EPCI is able to extract 132 potentially copyright infringement Web pages per a given copyright reserved seed-text with 94% precision in average.

    DOI

  • 商用検索エンジンのランキングに関する定量的評価と特徴解析

    吉田泰明, 上田高徳, 田代崇, 平手勇宇, 山名早人

    情報研報(DBS),Vol.2007   No.65   441 - 446  2007

  • 検索エンジンを用いた英文冠詞誤りの検出

    平野孝佳, 平手勇宇, 山名早人

    情報研報(DBS),Vol.2007     139 - 144  2007

  • リモートメモリを用いたランダムディスクアクセス高速化手法

    上田高徳, 平手勇宇, 山名 早人

    情処研報(ARC), Vol.2007   No.79   151 - 156  2007

  • multiple sequence alignment program based on group-to-group sequence alignment algorithm with piecewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    ISMB/ECCB2007, Austria Center Vienna    2007

  • 実写画像への手描き風輪郭線付加による絵画画像自動作成

    坂本祐軌, 山名早人

    情処研報(ARC), Vol.2007   No.84  2007

  • 商用サーチエンジンのランキング解析サポートシステム

    吉田泰明, 舟橋卓也, 片瀬弘晶, 上田高徳, 平手勇宇, 山名早人

    DBWeb2007    2007

  • 学内ドメインに存在する隠れたWebページの解析

    平手勇宇, シュティフ ロマン, 魏小比, 山名早人

    平成19年度情報教育研究集会    2007

  • multiple sequence alignment program based on group-to-group sequence alignment algorithm with piecewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    ISMB/ECCB2007, Austria Center Vienna    2007

  • P2Pファイル共有ネットワークを利用した大規模分散ストレージの実現

    岡本雄太, 蛭田智則, 山名早人

    情報処理学会全国大会講演論文集   69th ( 3 )  2007

    J-GLOBAL

  • MathBox: Interactive Pen-Based Interface for Inputting Mathematical Expressions

    Yuji Kasuya, Hayato Yamana

    2007 INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES     274 - 277  2007

     View Summary

    Inputting mathematical expressions with a mouse and a keyboard is a troublesome task. Thus, a number of mathematical expression recognition systems capable of recognizing handwritten mathematical expressions to input them into computers have been proposed. Even with these systems, however, structure recognition of mathematical expressions is still difficult. This paper presents MathBox, a new pen-based interface for inputting mathematical expressions into computers. MathBox interactively shows "boxes" in which the user can write one symbol. The boxes are shown along with the user's writing. For example, when the user writes 'x,' the boxes for a power and an index of 'x' and for the next symbol are shown. When the user inputs a fraction line, boxes for the numerator, denominator, and the next symbol are shown. MathBox skips recognizing the structures of expressions, which enables users to write mathematical expressions with practical accuracy.

    DOI

  • The development and evaluation of a prototype system for the inference of genetic networks based on genetic programming

    Kouji Tanaka, Hayato Yamana

    WMSCI 2007: 11TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IV, PROCEEDINGS   4   13 - 18  2007

     View Summary

    Estimating mutual interactions of genetic networks is mainly to infer the mutual control relationships from multiple genes from the gene expression data. Such correlations are typically expressible in the form of nonlinear simultaneous differential equations. However, most work to date has employed S-systems as an expression of such differential equations, allowing only rough approximations of mass actions, and as such it was difficult to determine the actual correlations between the genes. Instead, we formulate the mutual interactions as actual simultaneous partial differential equations, and automatically determine its structure and coefficients using genetic programming (GP) from a given data series. Parallel implementation of the scheme in a Grid environment using our Jojo Grid programming system for Java has resulted in precise determination of the equations in many cases within some reasonable time.

  • 1P474 Automatic extraction of conserved region from alignment based on protein structure(23. Bioinformatics, genomics and proteomics (I),Poster Session,Abstract,Meeting Program of EABS &BSJ 2006)

    Yamada Shinsuke, Yamada Koutarou, Yamana Hayato, Noguchi Tamotsu

    Biophysics   46 ( 2 ) S265  2006.10

    DOI CiNii

  • Domain linker prediction based on position-specific scoring matrix

    Takizawa Masatoshi, Yamana Hayato, Noguchi Tamotsu

    IPSJ SIG technical reports   2006 ( 99 ) 41 - 47  2006.09

     View Summary

    The domain linker prediction plays an important role in efficient protein structure analysis. Since previous domain linker prediction methods have employed sliding window, these methods do not explicitly consider the position dependence of amino acids within domain linkers In this paper, we propose a novel domain linker prediction method, focusing on both ends of the domain linker. Our method employs Support Vector Machines, which train on position dependence of amino acids extracted from the position-specific scoring matrix. As a result of the experiment using data set of coil regions dete...

    CiNii J-GLOBAL

  • Building a terabyte-scale web data collection "NW1000G-04" in the NTCIR-5 WEB task

    Masao Takaku, Keizo Oyama, Akiko Aizawa, Haruko Ishikawa, Haruko Ishikawa, Kengo Minamide, Shin Kato, Hayato Yamana, Hayato Yamana, Junya Hayashi

    NII Technical Reports   2006 ( 12 ) 1 - 8  2006.09

     View Summary

    We built a terabyte-scale web data collection, NW1000G-04, which was used in the NTCIR-5 WEB task. This report describes the process of building the collection and some statistics of it in detail.

  • Copyright violation detection system for Web texts

    TASHIRO TAKASHI, UEDA TAKANORI, HORI TAISUKE, HIRATE Yu, YAMANA HAYATO

    IPSJ SIG Notes   2006 ( 78 ) 27 - 33  2006.07

     View Summary

    Due to explosive increase of the number of web pages, the number of copyright violation web pages, such as lyrics or news citation pages without permission, has also been increased. To solve this problem, we propose a system for detecting copyright violation web pages. The proposed system consists of three steps. Firstly, the system generates search keywords on phrasal units, called &quot;bunsetsu&quot;, which are included in the &quot;seed page.&quot; Secondly, on search keywords generated by the first step, the system gathers candidate of web pages violating copyright by using Google or Yahoo! web service. F...

    CiNii J-GLOBAL

  • Support system for detecting abuse users in internet auction

    HIRATE YU, AIYOSHIZAWA AKIRA, O SHOREI, IOKU YUICHI, KIDO FUYUKO, YAMANA HAYATO

    IPSJ SIG Notes   2006 ( 78 ) 367 - 374  2006.07

     View Summary

    Due to recent widespread use of internet auction, a huge amount of users trade with each other. At the same time, damages from the fraud caused by abuse users have been become a serious problem of the internet auction system. In this paper, we developed abuse user detecting system referring to rating log data, which is a part of auction log data. Rating log data indicates assesment between seller and buyer. The system consists of two methods; extracting users who are rated as &quot;Good&quot; abusively, and extracting abuse users&#039; community based on one abuse user. Our evaluation shows proposing syst...

    CiNii J-GLOBAL

  • Support system for detecting abuse users in internet auction

    HIRATE Yu, AIYOSHIZAWA Akira, O Shorei, IOKU Yuichi, KIDO Fuyuko, YAMANA Hayato

    IEICE technical report. Data engineering   106 ( 150 ) 37 - 42  2006.07

     View Summary

    Due to recent widespread use of internet, many people use internet auction system, and trade with each other. At the same time, damages from the fraud caused by abuse users have been become a serious problem of the internet auction system. In this paper, we developed abuse user detecting system referring to rating log data, which is a part of auction log data. Rating log data indicates assesment between seller and buyer. The system consists of two methods; extracting users who are rated as &quot;Good&quot; abusively, and extracting abuse users&#039; community based on one abuse user. Our evaluation shows ...

    CiNii J-GLOBAL

  • Copyright violation detection system for Web texts

    TASHIRO Takashi, UEDA Takanori, HORI Taisuke, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering   106 ( 149 ) 23 - 28  2006.07

     View Summary

    Due to explosive increase of the number of web pages, the number of copyright violation web pages, such as lyrics or news citation pages without permission, has also been increased. To solve this problem, we propose a system for detecting copyright violation web pages. The proposed system consists of three steps. Firstly, the system generates search keywords on phrasal units, called &quot;bunsetsu&quot;, which are included in the &quot;seed page.&quot; Secondly, on search keywords generated by the first step, the system gathers candidate of web pages violating copyright by using Google or Yahoo! web service. F...

    CiNii J-GLOBAL

  • Hierarchical Clustering Of Feature Vectors at Visual Attentional Points

    SAITO Jun, YAMANA Hayato

    IPSJ SIG Notes. CVIM   2006 ( 25 ) 57 - 62  2006.03

     View Summary

    In Content-based image retrieval, the classifications is needed for better performances of (i)speeds of retrieval and (ii)semanticity of retrieval. Our system extracts the most attentional points by using a selective visual attention model which extracts feature vectors of attentional points in images. And our system classifies feature vectors by hierarchical clustering with residuals. An attentional point in an image is outlier in an image, or special outlier. We propose extension of selective attention model to extract temporal outlier with residual vectors, and the method of moving atten...

    CiNii J-GLOBAL

  • Estimation of Shape and Reflectance by using Extended SFS from Multiple Views

    KOBAYASHI MASANORI, IGUCHI SHIGERU, YAMANA HAYATO

    IPSJ SIG Notes. CVIM   2006 ( 25 ) 391 - 398  2006.03

     View Summary

    There exist many methods to reconstract the 3D model from the real object. However, they have some restrictions such as using expensive devices, using reference objects, or based on assumption that the target object is composed of one material. This paper proposes a new method that is based on shape from shading using multiple views.The proposed method treats the object composed of multiple materials. The proposed method preliminarily clusters the reflectances using the input images, and then analyze the 3D shape and the reflectance parameters.

    CiNii J-GLOBAL

  • Estimation of Shape and Reflectance by using Extended SFS from Multiple Views

    KOBAYASHI MASANORI, IGUCHI SHIGERU, YAMANA HAYATO

    Technical report of IEICE. PRMU   105 ( 674 ) 219 - 226  2006.03

     View Summary

    There exist many methods to reconstract the 3D model from the real object. However, they have some restrictions such as using expensive devices, using reference objects, or based on assumption that the target object is composed of one material. This paper proposes a new method that is based on shape from shading using multiple views.The proposed method treats the object composed of multiple materials. The proposed method preliminarily clusters the reflectances using the input images, and then analyze the 3D shape and the reflectance parameters.

    CiNii J-GLOBAL

  • Hierarchical Clustering Of Feature Vectors at Visual Attentional Points

    SAITO Jun, YAMANA Hayato

    Technical report of IEICE. PRMU   105 ( 673 ) 57 - 62  2006.03

     View Summary

    In Content-based image retrieval, the classifications is needed for better performances of (i) speeds of retrieval and (ii) semanticity of retrieval. Our system extracts the most attentional points by using a selective visual attention model which extracts feature vectors of attentional points in images. And our system classifies feature vectors by hierarchical clustering with residuals. An attentional point in an image is outlier in an image, or special outlier. We propose extension of selective attention model to extract temporal outlier with residual vectors, and the method of moving att...

    CiNii J-GLOBAL

  • Optimization Technique for Cache Memory Considering Wire Delay

    HIRUTA Tomonori, MASUDA Keisuke, YAMANA Hayato

    IPSJ SIG Notes   Vol.2006 ( No.20 ) 19 - 24  2006.02

     View Summary

    The increase of the gap between processor speed and memory speed makes cache memory more important. However, wire delay in large cache grows that results from the process miniaturization. Therefore, cache memory access will become bottle neck. This paper proposes an optimization technique for cache memory considering wire delay. We implement this technique with SimpleScalar 3.0d and evaluate with SPEC95 CINT and SPEC2000 CINT. In the result, IPC grows at the average of 1.17 times.

    CiNii J-GLOBAL

  • Optimization Technique for Cache Memory Considering Wire Delay

    HIRUTA Tomonori, MASUDA Keisuke, YAMANA Hayato

    IPSJ SIG Notes   2006 ( 20 ) 19 - 24  2006.02

     View Summary

    The increase of the gap between processor speed and memory speed makes cache memory more important. However, wire delay in large cache grows that results from the process miniaturization. Therefore, cache memory access will become bottle neck. This paper proposes an optimization technique for cache memory considering wire delay. We implement this technique with SimpleScalar 3.0d and evaluate with SPEC95 CINT and SPEC2000 CINT. In the result, IPC grows at the average of 1.17 times.

    CiNii

  • Recognition of Similar Character Pairs with Two Types of SVMs for Online Mathematical Expression Recognition

    KASUYA Yuji, YAMANA Hayato

    Technical report of IEICE. Thought and language   105 ( 612 ) 55 - 60  2006.02

     View Summary

    Mathematical expression recognition systems which recognize mathematical expressions and translates them into digital data usable in computer is needed. However characters and symbols in mathematical expressions are sometimes similar and difficult to discriminate. This paper proposes a method to recognize similar character pairs with two types of SVM (Support Vector Machine). One is normal SVM which uses images of handwriting as input; the other is SVMGDTW which uses sequences of pen position. With the proposed method, &quot;γ&quot;and &quot;r&quot; are discriminated with a recognition rate of 86.7%, &quot;ω&quot; and &quot;...

    CiNii J-GLOBAL

  • Content-based Image Retrieval with Selective Visual Attention

    SAITO Jun, YAMANA Hayato

    Technical report of IEICE. Thought and language   105 ( 612 ) 61 - 66  2006.02

     View Summary

    Content-based image retrievals (CBIR) have difficulty on concentrated information processing at the point which seems important. This is caused by difficulty on automated selection of &quot;informative&quot; points. As one part of studies of humans or primate brain, selective visual attention has been researched, which model processes for decision of the attentional point of an image before eye movements when the subject is presented an image. To use local information of an image, we propose a CBIR system with the feature vector at the most attentional point of the image. We introduce model of select...

    CiNii J-GLOBAL

  • Recognition of Similar Character Pairs with Two Types of SVMs for Online Mathematical Expression Recognition

    KASUYA Yuji, YAMANA Hayato

    Technical report of IEICE. PRMU   105 ( 614 ) 55 - 60  2006.02

     View Summary

    Mathematical expression recognition systems which recognize mathematical expressions and translates them into digital data usable in computer is needed. However characters and symbols in mathematical expressions are sometimes similar and difficult to discriminate. This paper proposes a method to recognize similar character pairs with two types of SVM (Support Vector Machine). One is normal SVM which uses images of handwriting as input; the other is SVMGDTW which uses sequences of pen position. With the proposed method, &quot;γ&quot;and &quot;r&quot; are discriminated with a recognition rate of 86.7%,&quot;ω&quot; and &quot;w...

    CiNii

  • Content-based Image Retrieval with Selective Visual Attention

    SAITO Jun, YAMANA Hayato

    Technical report of IEICE. PRMU   105 ( 614 ) 61 - 66  2006.02

     View Summary

    Content-based image retrievals(CBIR) have difficulty on concentrated information processing at the point which seems important. This is caused by difficulty on automated selection of &quot;informative&quot; points. As one part of studies of humans or primate brain, selective visual attention has been researched, which model processes for decision of the attentional point of an image before eye movements when the subject is presented an image. To use local information of an image, we propose a CBIR system with the feature vector at the most attentional point of the image. We introduce model of selecti...

    CiNii

  • Sequential pattern mining with time intervals

    Yu Hirate, Hayato Yamana

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   3918 LNAI   775 - 779  2006

     View Summary

    Sequential pattern mining can be used to extract frequent sequences maintaining their transaction order. As conventional sequential pattern mining methods do not consider transaction occurrence time intervals, it is impossible to predict the time intervals of any two transactions extracted as frequent sequences. Thus, from extracted sequential patterns, although users are able to predict what events will occur, they are not able to predict when the events will occur. Here, we propose a new sequential pattern mining method that considers time intervals. Using Japanese earthquake data, we confirmed that our method is able to extract new types of frequent sequences that are not extracted by conventional sequential pattern mining methods. © Springer-Verlag Berlin Heidelberg 2006.

    DOI

  • Generalized sequential pattern mining with item intervals

    Yu Hirate, Hayato Yamana

    Journal of Computers   1 ( 3 ) 51 - 60  2006

     View Summary

    Sequential pattern mining is an important data mining method with broad applications that can extract frequent sequences while maintaining their order. However, it is important to identify item intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence &lt
    A
    B &gt
    with a 1-day interval and a sequence &lt
    A
    B &gt
    with a 1-year interval are completely different
    the former sequence may have some association, while the latter may not. To adopt item intervals, two approaches have been proposed for integration of item intervals with sequential pattern mining
    (1) constraint-based mining and (2) extended sequence-based mining. However, although constraint-based mining approach avoids the extraction of sequences with non-interest time intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints related to item interval, and users must re-execute constraint-based algorithms with changing constraint values. On the other hand, extended sequence-based mining approach does not need to specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt any constraints based on time intervals, it may extract meaningless patterns, such as sequences with too long item intervals. This means these two approaches have not only advantages but also disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with item interval. The generalization includes three points
    (a) a capability to handle two kinds of item interval measurement, item gap and time interval, (b) a capability to handle extended sequences which are defined by inserting pseudo items based on the interval itemization function, and (c) adopting four item interval constraints. Generalized sequential pattern mining is able to substitute all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese earthquake data, we have confirmed that our proposed algorithm is able to extract sequential patterns with item interval, defined in a flexible manner by the interval itemization function. © 2006 ACADEMY PUBLISHER.

    DOI

  • 選択注視を用いた画像検索システムの提案

    斎藤純, 山名早人

    信学技報(PRMU)   Vol.105 ( No.614 ) 61 - 66  2006

  • SVMを用いたオンライン類似数式文字認識

    糟谷勇児, 山名早人

    信学技報(PRMU)    2006

  • 迷惑メールを見分ける賢いチップ

    山名早人監修, G.スティックス

    日経サイエンス   2006年5月号  2006

  • 時間情報を含むシーケンシャルパターンマイニングの一般化

    平手勇宇, 山名早人

    DEWS2006    2006

  • 検索エンジンを利用した英作文支援システムの構築

    佐藤学, 安藤進, 山名早人

    言語処理学会第12回年次大会   12th   664 - 667  2006

    J-GLOBAL

  • 距離と属性を考慮したPrefixSpanによる感情表現抽出

    佐藤一誠, 平手勇宇, 山名早人

    DEWS2006    2006

  • 学習器残差の距離による画像検索システム

    斎藤純, 山名早人

    信学技法(PRMU)   Vol.105 ( No.673 ) 57 - 62  2006

  • 拡張多視点SFSによる3次元形状と反射属性の推定

    小林正典, 井口茂, 山名早人

    情処研報(CVIM),   Vol.2006 ( No.25 ) 391 - 398  2006

  • リンク構造解析による不要なWebコミュニティの事前判別

    斉田直幸, 山名早人

    DEWS2006    2006

  • Fact of the Web:50億ページのウェブの解析

    加藤真, 山名早人

    DEWS2006    2006

  • タンパク質立体構造に基づく保存領域の自動抽出

    山田晃太郎, 山田真介, 山名早人, 野口保

    第6回日本蛋白質科学会年会   ポスター番号2P-07  2006

    J-GLOBAL

  • Text Mining using PrefixSpan constrained by Item Interval and Item Attribute

    Issei Sato, Yu Hirate, Hayato Yamana

    ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops     35 - 38  2006

     View Summary

    Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals-The number of items between any two adjacent items in a sequence-And the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.

    DOI

  • インターネットオークションにおける不正行為者の発見支援

    平手勇宇, 相吉澤明, 翁松齢, 井奥雄一, 木戸冬子, 山名早人

    情報研報(DBS)   Vol.2006 ( 140(2) ) 367 - 374  2006

  • Web ページを対象とした著作権違反自動検知システム

    田代崇, 上田高徳, 堀泰祐, 平手勇宇, 山名早人

    情報研報(DBS)   Vol.2006 ( 140(2) ) 27 - 33  2006

  • 配列プロファイルを利用したドメインリンカー予測

    滝沢雅俊, 山名早人, 野口保

    情処研報(BIO)   Vol.2006 ( 99 ) 41 - 47  2006

  • 検索エンジンを用いた英文冠詞誤りの検出

    平野孝佳, 平手勇宇, 山名早人

    日本データベース学会Letters Vol.6, No.3     1 - 4  2006

  • インターネットオークションにおける不正行為者の発見支援

    平手勇宇, 相吉澤 明, 翁 松齢, 井奥雄一, 木戸冬子, 山名早人

    日本データベース学会Letters   Vol.5 ( 2 ) 77 - 80  2006

    J-GLOBAL

  • Web上の文章を対象とした著作権違反自動検知システム

    田代 崇, 上田高徳, 堀 泰祐, 平手勇宇, 山名早人

    日本データベース学会Letters   Vol.5 ( 2 ) 25 - 28  2006

    J-GLOBAL

  • 学内ドメインに存在する著作権違反ページ抽出の可能性

    平手勇宇, 山名早人

    平成18年度情報教育研究集会論文集     876 - 879  2006

  • Web Structure in 2005

    Yu Hirate, Hayato Yamana

    WAW2006, Banff    2006

  • Automatic extraction of conserved region from alignment based on protein structure

    Shinsuke Yamada, Kouratou Yamada, Hayato Yamana, Tamotsu Noguchi

    EABS & BSJ 2006   Poster No. 1P474  2006

  • Prediction of domain and disordered regions in proteins by fold recognition and secondary structure prediction

    Masatoshi Takizawa, Naoko Inoue, Kentaro Tomii, Hayato Yamana, Tamotsu Noguchi

    Critical Assessment of Techniques for Protein Structure Prediction Seventh Meeting   Poster No.9  2006

  • Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    BMC Bioinformatics   7  2006

     View Summary

    Background: Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. Results: We propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee. Conclusion: PRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at http://prime.cbrc.jp/. © 2006 Yamada et al
    licensee BioMed Central Ltd.

    DOI PubMed

  • Text Mining using PrefixSpan constrained by Item Interval and Item Attribute

    Issei Sato, Yu Hirate, Hayato Yamana

    ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops     35 - 38  2006

     View Summary

    Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals-The number of items between any two adjacent items in a sequence-And the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.

    DOI

  • Sequential pattern mining with time intervals

    Yu Hirate, Hayato Yamana

    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS   3918   775 - 779  2006

     View Summary

    Sequential pattern mining can be used to extract frequent sequences maintaining their transaction order. As conventional sequential pattern mining methods do not consider transaction occurrence time intervals, it is impossible to predict the time intervals of any two transactions extracted as frequent sequences. Thus, from extracted sequential patterns, although users are able to predict what events will occur, they are not able to predict when the events will occur. Here, we propose a new sequential pattern mining method that considers time intervals. Using Japanese earthquake data, we confirmed that our method is able to extract new types of frequent sequences that are not extracted by conventional sequential pattern mining methods.

    DOI

  • Contour Extraction using Texture and Non-Texture Distinction

    IGUCHI Shigeru, YAMANA Hayato

    Technical report of IEICE. PRMU   105 ( 414 ) 13 - 18  2005.11

     View Summary

    This paper proposes a technique for applying a suitable contour extraction method to a texture region and a non-texture region to improve the accuracy of the contour extraction after dividing an image into these two regions. The most basic idea to extract contours is edge detection by derivative filters, however, it is hard to say edges equal borderlines. Thus, a texture analysis is essential to get the accurate result. Most of the conventional studies apply either edge detection or texture analysis to the whole in an image. Against that, in this paper, we firstly extract a texture region a...

    CiNii J-GLOBAL

  • Sample Collection System for Online Handwritten Mathematical Expressions written by Digital Pen and Preliminary Recognition Experiments

    KASUYA Yuji, YAMANA Hayato

    Technical report of IEICE. PRMU   105 ( 374 ) 7 - 12  2005.10

     View Summary

    This paper proposes a sample collection system for online handwritten mathematical expressions based on digital pens. In the prior online handwriting character recognition systems, samples collected by pen tablets have been used. But data by pen tablets are (1) difficult to collect because users aren&#039;t familiar with pen tablets, (2) different from real handwriting because users have to look at their monitors to write characters. On the contrary digital pens, easy to use for the first time, are used and samples written by 74 examinees are collected. By recognition experiments following facts...

    CiNii J-GLOBAL

  • C-013 A Consideration on Thread-Level Speculative Execution

    SAITO Fumiko, YAMANA Hayato

    情報科学技術フォーラム一般講演論文集   4 ( 1 ) 205 - 206  2005.08

    CiNii

  • Sequential Pattern Mining based on Event Intervals

    HIRATE YU, KOMATSU SHUNSUKE, YAMANA HAYATO

    IPSJ SIG Notes   2005 ( 68 ) 321 - 328  2005.07

     View Summary

    In data mining researches, sequential pattern mining extracts frequent sequences keeping their event occurrence orders. Since conventional sequential pattern mining methods have not consider event occurrence time intervals, it is impossible to understand time intervals of any two events which are included by result sequences. In this paper, we propose a new sequential pattern mining method which considers event occurrence time intervals. As a result of our evaluation in applying the earthquake data. We confirmed our new method can extract new kind of frequent sequences which couldn&#039;t extracted by conventional sequential pattern mining methods.

    CiNii

  • Sequential Pattern Mining based on Event Intervals

    HIRATE YU, KOMATSU SHUNSUKE, YAMANA HAYATO

    IPSJ SIG Notes   2005 ( 68 ) 321 - 328  2005.07

     View Summary

    In data mining researches, sequential pattern mining extracts frequent sequences keeping their event occurrence orders. Since conventional sequential pattern mining methods have not consider event occurrence time intervals, it is impossible to understand time intervals of any two events which are included by result sequences. In this paper, we propose a new sequential pattern mining method which considers event occurrence time intervals. As a result of our evaluation in applying the earthquake data. We confirmed our new method can extract new kind of frequent sequences which couldn&#039;t extrac...

    CiNii

  • Sequential Pattern Mining based on Event Intervals

    HIRATE Yu, KOMATSU Shunsuke, YAMANA Hayato

    IEICE technical report. Data engineering   105 ( 172 ) 43 - 48  2005.07

     View Summary

    In data mining researches, sequential pattern mining extracts frequent sequences keeping their event occurrence orders. Since conventional sequential pattern mining methods have not consider event occurrence time intervals, it is impossible to understand time intervals of any two events which are included by result sequences. In this paper, we propose a new sequential pattern mining method which considers event occurrence time intervals. As a result of our evaluation in applying the earthquake data. We confirmed our new method can extract new kind of frequent sequences which couldn&#039;t extrac...

    CiNii J-GLOBAL

  • From the Search Engine to the Analysis Engine

    Yamana Hayato

    Journal of Japanese Society for Artificial Intelligence   Vol.20 ( No.4 ) 471 - 478  2005.07

    CiNii J-GLOBAL

  • TF2P-growth:Frequent Itemset Mining Algorithm without Any Thresholds

    HIRATE YU, IWAHASHI EIGO, YAMANA HAYATO

    情報処理学会論文誌データベース(TOD)   Vol.46 ( No.SIG 8(TOD 26) ) 60 - 71  2005.06

     View Summary

    Conventional frequent itemset mining algorithms require some user-specified minimum support, and then mine frequent itemsets with support values that are higher than the minimum support. As it is difficult to predict how many frequent itemsets will be mined with a specified minimum support, the Top-κ mining concept has been proposed. The Top-κ Mining concept is based on an algorithm for mining frequent itemsets without a minimum support, but with the number of most κ frequent itemsets ordered according to their support values. However, the Top-κ mining concept still requires a threshold κ. Therefore, users must decide the value of κ before initiating mining. In this paper, we propose a new mining algorithm, called &quot;TF^2P-growth, &quot; which does not require any thresholds. This algorithm mines itemsets with the descending order of their support values without any thresholds and returns frequent itemsets to users sequentially with short response time.

    CiNii

  • 10. Productive ICT Academia Project

    UEDA Kazunori, OISHI Shinichi, KATTO Jiro, NAKAJIMA Tatsuo, MURAOKA Yoichi, YAMANA Hayato

    Journal of Information Processing Society of Japan   46 ( 4 ) 410 - 416  2005.04

    CiNii

  • The Current Status of the Art of the 21st COE Programs in the Information Sciences Field (1) Productive ICT Academia Project

    上田和紀, 大石進一, 甲藤二郎, 中島達夫, 村岡洋一, 山名早人

    情報処理   46 ( 4 ) 410 - 416  2005.04

    CiNii J-GLOBAL

  • Defense against Buffer Overflow by Segmenting Stack Frame

    Hiruta Tomonori, Yamana Hayato

    情報処理学会研究報告. SLDM, [システムLSI設計技術]   2005 ( 27 ) 161 - 165  2005.03

     View Summary

    In recent years, Buffer overflow Attacks are increasing. Buffer overflow is caused by inputting larger data than date space which prepared for various numbers. The most danger buffer overflow is stack overflow. When Stack Overflow occurs, return address is re-written and malicious code becomes executable. This paper proposes defense technique against buffer overflow by segmenting stack frame. We implement this technique Simplescalar tool set ver3.0d and evaluate with SPEC CINT95. Evaluation result shows that this technique causes 2.3% performance degradation.

    CiNii

  • Defense against Buffer Overflow by Segmenting Stack Frame

    HIRUTA Tomonori, YAMANA Hayato

      119 ( 0 ) 161 - 165  2005.03

     View Summary

    In recent years, Buffer overflow Attacks are increasing. Buffer overflow is caused by inputting larger data than date space which prepared for various numbers. The most danger buffer overflow is stack overflow. When Stack Overflow occurs, return address is re-written and malicious code becomes executable. This paper proposes defense technique against buffer overflow by segmenting stack frame. We implement this technique Simplescalar tool set ver3.0d and evaluate with SPEC CINT95. Evaluation result shows that this technique causes 2.3% performance degradation.

    CiNii

  • Defense against Buffer Overflow by Segmenting Stack Frame

    Hiruta Tomonori, Yamana Hayato

    IEICE technical report. Computer systems   104 ( 738 ) 71 - 75  2005.03

     View Summary

    In recent years, Buffer overflow Attacks are increasing. Buffer overflow is caused by inputting larger data than date space which prepared for various numbers. The most danger buffer overflow is stack overflow. When Stack Overflow occurs, return address is re-written and malicious code becomes executable. This paper proposes defense technique against buffer overflow by segmenting stack frame. We implement this technique Simplescalar tool set ver3.0d and evaluate with SPEC CINT95. Evaluation result shows that this technique causes 2.3% performance degradation.

    CiNii J-GLOBAL

  • MPIETE2:Improvement of the MPI Execution Time Estimator in Prediction Error of Communication Time

    IWABUCHI Toshihiro, SUGITA Shu, YAMANA Hayato

    IPSJ SIG Notes   2005 ( 19 ) 175 - 180  2005.03

     View Summary

    In this paper, we improve the MPI Execution Time Estimator (MPIETE) to reduce the prediction error of communication time. MPIETE we have proposed is the execution time estimation tool for MPI programs. MPIETE&#039;s scheme divides a MPI program into the computation blocks and the communication blocks, and then predicts the total execution time by summing the execution time of each block. Since estimating the block execution time is fast, MPIETE enables to predict the total execution time faster than executing MPI program actually. However, MPIETE assumes no network contension. This results in some errors to predict the delay-time with network contentions. In this paper, by proposing the new estimation scheme for communication block including the delay-time, we improve the MPIETE. The proposed scheme enables to predict the performance decrement and to find out the number of the Processing Unit (PU) where the target platform marks the best performance. We have evaluated MPIETE2, that improves MPIETE with the proposed scheme, using EP, CG, FT, MG from NAS Parallel Benchmmarks 2.4. As the results for 2-128PU, the prediction error ranges are less than 14% and the execution time of the prediction is 1/4 times smaller than the actual execution time. Moreover, MPIETE2 predicts exactly the number of PU where the target platform marks the best performance.

    CiNii J-GLOBAL

  • The Proposal of Tri-Mode Branch Predictor

    SAITO FUMIKO, YAMANA HAYATO

    IPSJ SIG Notes   2005 ( 19 ) 25 - 30  2005.03

     View Summary

    The branch prediction is installed on the recent processor to avoid stalling pipeline. Branch prediction is a kind of speculative execution for control dependence. In the recent year, the deeper pipeline gets, the higher branch miss prediction penalty reaches. Thus, branch miss prediction rate must lower to rise processor performance. Recently, many various branch predictors have been proposed. Hybrid branch predictors composed of multiple pattern history tables (PHT) show the highest accuracy among them. The Bi-Mode branch predictor is the most famous of the hybrid branch predictors. On the Bi-Mode predictor, the Choice PHT judges the branch bias and selects the Direction PHT(Taken or NotTaken PHT). This paper focuses on the Weakly Branches which the Choice PHT judges Weakly Taken or NotTaken don&#039;t have the branch bias. In order to avoid the Weakly branch influence on the Direction PHTs, we propose &quot;the Tri-Mode brach predictor&quot; added the Weakly PHT predicting the Weakly branches. On the 12KB Tri-Mode predictor, the branch miss reduction rate from the Bi-Mode predictor shows average 2.78% in the SPECint95(ref inputs) benchmark simulation.

    CiNii J-GLOBAL

  • The Proposal of Tri-Mode Branch Predictor

    SAITO FUMIKO, YAMANA HAYATO

    IPSJ SIG Notes   2005 ( 19 ) 25 - 30  2005.03

     View Summary

    The branch prediction is installed on the recent processor to avoid stalling pipeline. Branch prediction is a kind of speculative execution for control dependence. In the recent year, the deeper pipeline gets, the higher branch miss prediction penalty reaches. Thus, branch miss prediction rate must lower to rise processor performance. Recently, many various branch predictors have been proposed. Hybrid branch predictors composed of multiple pattern history tables (PHT) show the highest accuracy among them. The Bi-Mode branch predictor is the most famous of the hybrid branch predictors. On th...

    CiNii

  • MPIETE2 : Improvement of the MPI Execution Time Estimator in Prediction Error of Communication Time

    IWABUCHI Toshihiro, SUGITA Shu, YAMANA Hayato

    IPSJ SIG Notes   2005 ( 19 ) 175 - 180  2005.03

     View Summary

    In this paper, we improve the MPI Execution Time Estimator (MPIETE) to reduce the prediction error of communication time. MPIETE we have proposed is the execution time estimation tool for MPI programs. MPIETE&#039;s scheme divides a MPI program into the computation blocks and the communication blocks, and then predicts the total execution time by summing the execution time of each block. Since estimating the block execution time is fast, MPIETE enables to predict the total execution time faster than executing MPI program actually. However, MPIETE assumes no network contension. This results in so...

    CiNii

  • Selective Attention System by Residual Information of Predictive Coding

    SAITOH Jun, YAMANA Hayato

    IPSJ SIG Notes. CVIM   148   235 - 242  2005.03

     View Summary

    This paper describes unsupervised selective attention system by using residual information of Predictive Coding. The imitation system of brain visual passway by Rao and Ballard, or Predictive Coding, uses learning rule to minimamize the residuals between the inputs and the internal predictions, which results to get linear coding of subimages by basis set. Residual-based selective attention model can select out "informative" subimages from a validation image because they have features which is rare to been seen in learning sample subimages. We experiment our system to have it learn some similar natural scenes and then to have it recognize another kind of images. We discussed these experiments and usefulness of our system.

    CiNii

  • Selective Attention System by Residual Information of Predictive Coding

    Saitoh Jun, Yamana Hayato

    IPSJ SIG Notes. CVIM   2005 ( 18 ) 235 - 242  2005.03

     View Summary

    This paper describes unsupervised selective attention system by using residual information of Predictive Coding. The imitation system of brain visual passway by Rao and Ballard, or Predictive Coding, uses learning rule to minimamize the residuals between the inputs and the internal predictions, which results to get linear coding of subimages by basis set. Residual-based selective attention model can select out &quot;informative&quot; subimages from a validation image because they have features which is rare to been seen in learning sample subimages. We experiment our system to have it learn some simi...

    CiNii J-GLOBAL

  • An Efficient Synchronization Scheme Using Speculative Threads on Hyper-Threading Technology

    HONDA Dai, SAITO Fumiko, YAMANA Hayato

    IPSJ SIG Notes   2005 ( 7 ) 33 - 38  2005.01

     View Summary

    Recently, the gap between CPU processing speed and the data transmission speed from the main memory has lowered execution speed. Thus, data caching technique becomes more important. Particularly in pointer-based programs which have nonlinear access patterns, the cache miss rate is very high. To solve this problem, Pre-Execution has been proposed as a cache miss latency tolerance technique that makes one or more helper threads running in the spare CPU&#039;s resources ahead of the main computation. This paper proposes the synchronous technique between main thread and helper thread. Furthermore, t...

    CiNii J-GLOBAL

  • A Branch Prediction Technique focused on Weak States of Prediction Table

    Nakazawa Yukari, Saito Fumiko, Yamana Hayato

    IPSJ SIG Notes   2005 ( 7 ) 51 - 56  2005.01

     View Summary

    In recent years, as the pipeline&#039;s length gets deeper, and the instruction fetch width and the issue width become wider, more accurate branch predictors are needed. Branch predictors predict with the 2 bit saturating counters (predictor counters) whose state is changed by the execution result of the branch. As a result of analyzing the prediction accuracy in each state (Strongly Taken, Weakly Taken, Weakly Not-taken and Strongly Not-taken) of prediction counters, it turns out that the prediction accuracy in the Weak states of gshare predictor is especially low. We propose the predictor sele...

    CiNii J-GLOBAL

  • PlusDBG: Web community extraction scheme improving both precision and pseudo-recall

    N Saida, A Umezawa, H Yamana

    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005   3399   938 - 943  2005

     View Summary

    This paper proposes PlusDBG to improve both precision and pseudo-recall by extending the conventional Web community extraction scheme. Precision is defined as the percentage of relevant Web pages extracted as members of Web communities and pseudo-recall is defined as the sum of the number of relevant Web pages extracted as members of Web communities. The proposed scheme adopts the new distance parameter defined by the relevance between a Web page and a Web community, and extracts the Web community with higher precision and pseudo-recall. Moreover, we have implemented and evaluated the proposed scheme. Our results confirm that the proposed scheme is able to extract about 3.2-fold larger numbers of members of Web communities than the conventional scheme, while maintaining equivalent precision.

  • Googleを超える利口な検索エンジン

    山名早人監修, J.モスタファ

    日経サイエンス   日経サイエンス2005年5月号  2005

  • 迷惑メールを撃退する

    山名早人監修, J.グッドマン, D.ベッカーマン, R.ラウンスウェイト

    日経サイエンス   日経サイエンス2005年7月号  2005

  • イベント発生間隔を考慮したシーケンシャルパターンマイニング

    平手勇宇, 小松俊介, 山名早人

    情報研報(DBS)   Vol.2005 ( No.68 ) 321 - 328  2005

  • Search Engines 2005-Guides to the Web-Introduction to Search Engines

    山名早人, 村田剛志

    情報処理   Vol.46 ( No.9 ) 981 - 987  2005

    CiNii J-GLOBAL

  • 三次元情報を利用した保存領域の自動決定

    山田晃太郎, 山田真介, 山名早人, 野口保

    産総研 生命情報科学人材養成コース 最終シンポジウム、ポスター番号040    2005

  • 区分的線形ギャップコストを用いたマルチプルアラインメントアルゴリズムの開発

    山田真介, 山名早人, 後藤修

    産総研 生命情報科学人材養成コース 最終シンポジウム、ポスター番号002    2005

  • スレッドレベル投機的実行に関する考察

    斎藤史子, 山名早人

    FIT2005,C-1   FIT 2005  2005

    J-GLOBAL

  • FORTE1を利用したドメイン予測法の開発

    滝沢雅俊, 山名早人, 野口保

    産総研 生命情報科学人材養成コース 最終シンポジウム、ポスター番号038    2005

  • デジタルペンを用いた数式サンプル収集システムの紹介と採取サンプルの解析

    糟谷勇児, 山名早人

    信学技報(PRMU)   Vol.105 ( No.374 ) 7 - 12  2005

  • PRIME - an implementation of a doubly nested randomized iterative refinement strategy with the picewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh

    CBRC2005, Poster No.2    2005

  • PRIME - an implementation of a doubly nested randomized iterative refinement strategy with the picewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh

    CBRC2005, Poster No.2    2005

  • テクスチャと非テクスチャの区別を用いた輪郭線抽出

    井口茂, 山名早人

    信学技報(PMRU)   Vol.105 ( No.414 ) 13 - 18  2005

  • P2Pファイル共有ネットワーク上で動作するメタファイルシステム

    岡本雄太, 山名早人

    日本ソフトウェア科学会インターネットテクノロジワークショップ2005(WIT2005)    2005

  • スパイウェア

    山名早人監訳, 斎藤純, 平手勇, 糟谷勇児, 柳井佳孝, 蛭田智則, 杉田秀, 井口茂訳

    CACM日本語版   Vol.6, No.1  2005

  • Multiple sequence alignment

    Osamu Gotoh, Shinsuke Yamada, Tetsushi Yada

    handbook of computational molecular biology    2005

  • Overview of the NTCIR-5 WEB Navigational Retrieval Subtask 2 (Navi-2)

    Keizo Oyama, Masao Takaku, Haruko Ishikawa, Akiko Aizawa, Hayato Yamana

    Proc. of NTCIR-5 Workshop    2005

  • Overview of the NTCIR-5 WEB Navigational Retrieval Subtask 2 (Navi-2)

    Keizo Oyama, Masao Takaku, Haruko Ishikawa, Akiko Aizawa, Hayato Yamana

    Proc. of NTCIR-5 Workshop    2005

  • リカレントネットを用いたオンライン文字認識システム

    糟谷勇児, 山名早人

    情報処理学会全国大会講演論文集   67th ( 2 )  2005

    J-GLOBAL

  • HMMでの動作認識における類似動作からの特徴部位抽出

    井口茂, 山名早人

    情報処理学会全国大会講演論文集   67th ( 2 )  2005

    J-GLOBAL

  • MPIプログラムの簡易実行による実行時間予測手法における通信予測の効率化

    杉田秀, 岩淵寿寛, 山名早人

    情報処理学会全国大会講演論文集   67th ( 1 )  2005

    J-GLOBAL

  • 相同性検索手法の組み合わせによる検索精度向上

    滝沢雅俊, 山田真介, 山名早人

    情報処理学会全国大会講演論文集   67th ( 3 )  2005

    J-GLOBAL

  • PlusDBG: Web community extraction scheme improving both precision and pseudo-recall

    N Saida, A Umezawa, H Yamana

    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005   3399   938 - 943  2005

     View Summary

    This paper proposes PlusDBG to improve both precision and pseudo-recall by extending the conventional Web community extraction scheme. Precision is defined as the percentage of relevant Web pages extracted as members of Web communities and pseudo-recall is defined as the sum of the number of relevant Web pages extracted as members of Web communities. The proposed scheme adopts the new distance parameter defined by the relevance between a Web page and a Web community, and extracts the Web community with higher precision and pseudo-recall. Moreover, we have implemented and evaluated the proposed scheme. Our results confirm that the proposed scheme is able to extract about 3.2-fold larger numbers of members of Web communities than the conventional scheme, while maintaining equivalent precision.

    DOI

  • 3P266 Identification of rigid domains by using complete graph and application to SCOP

    Mashiko R, Wako H, Yamana H

    Biophysics   44 ( 1 ) S256  2004.11

    DOI CiNii

  • F-033 Parallel Learning Methods of Reinforcement Learning on Shared Memory Multiprocessors

    Mori Kouichirou, Yamana Hayato

    情報科学技術フォーラム一般講演論文集   3 ( 2 ) 291 - 292  2004.08

    CiNii J-GLOBAL

  • An Efficient Caching Technique Using Speculative Threads on Hyper-Threading Technology

    HONDA Dai, SAITO Fumiko, YAMANA Hayato

    IPSJ SIG Notes   2004 ( 80 ) 43 - 48  2004.07

     View Summary

    Recently, the gap between CPU processing speed and the data transmission speed from the main memory has greatly influenced execution speed. Thus data caching technique become more important However, in pointer-based programs which have a nonlinear access pattern a cache memory does not function effectively. To solve this problem, Pre-Execution is a cache miss latency tolerance technique that uses one or more helper threads running in spare CPU&#039;s &#039;resources ahead of the main computation. This paper proposes the synchronous technique of the Helper thread. Furthermore, this paper examines the ...

    CiNii J-GLOBAL

  • A Translation Support System using Search Engines

    OSHIKA Hironori, SATOU Manabu, ANDO Susumu, YAMANA Hayato

    IPSJ SIG Notes   2004 ( 72 ) 585 - 591  2004.07

     View Summary

    This paper proposes a new Japanese-to-English-translation support system using search engines. The system uses Google to address some of the problems that non-native speakers of English will often encounter when trying to write English sentences. Especially, appropriate choices of English nouns and prepositions are a challenge for Japanese. To solve these problems, we focus on the techniques presented in the book titled &quot;Using Google to Improve Your Translation Skills&quot; written by Susumu Ando, published by Maruzen in 2003. The proposed system is implemented by automatically constructing opti...

    CiNii

  • A Translation Support System using Search Engines

    OSHIKA Hironori, SATOU Manabu, YAMANA Hayato

    IEICE technical report. Data engineering   104 ( 177 ) 237 - 242  2004.07

     View Summary

    This paper proposes a new Japanese-to-English-translation support system using search engines. The system uses Google to address some of the problems that non-native speakers of English will often encounter when trying to write English sentences. Especially, appropriate choices of English nouns and prepositions are a challenge for Japanese. To solve these problems, we focus on the techniques presented in the book titled &quot;Using Google to Improve Your Translation Skills&quot; written by Susumu Ando, published by Maruzen in 2003. The proposed system is implemented by automatically constructing opti...

    CiNii J-GLOBAL

  • Toward the Exploitation of New Applications based on Web Data

    YAMANA Hayato

    IPSJ SIG Notes   2004 ( 45 ) 107 - 110  2004.05

     View Summary

    The amount of the information on the Web is huge and the number of Web page is estimated about 9. 25 billion in April 2004. Moreover, one billion Web pageswill be added to the Web repository every year, which is estimated by calculating its average increase in these two years. It is not too much to say that the huge Web repository has all kinds of information, knowledge, and know-how that can not be learned by a human even if he spent all the life time to learn them. In this paper, we introduce the major research projects that concern how to crawl the huge Web pages, how to keep them up-to-date, and how to make full use of them.

    CiNii

  • Toward the Exploitation of New Applications based on Web Data

    YAMANA Hayato

    IPSJ SIG Notes   2004 ( 45 ) 107 - 110  2004.05

     View Summary

    The amount of the information on the Web is huge and the number of Web page is estimated about 9. 25 billion in April 2004. Moreover, one billion Web pageswill be added to the Web repository every year, which is estimated by calculating its average increase in these two years. It is not too much to say that the huge Web repository has all kinds of information, knowledge, and know-how that can not be learned by a human even if he spent all the life time to learn them. In this paper, we introduce the major research projects that concern how to crawl the huge Web pages, how to keep them up-to-...

    CiNii J-GLOBAL

  • Parallelization of Loops Including Loop Carried Dependences Using Thread Level Speculative Execution

    Ishikawa Shunsuke, Yamana Hayato

    情報処理学会研究報告. SLDM, [システムLSI設計技術]   2004 ( 33 ) 63 - 68  2004.03

     View Summary

    The loop parallelization scheme improves the program performance, dramatically. However, when the data dependence cannot be analyzed statically, the conventional parallelization scheme assumes that the data dependence exists. Thus the loop containing unanalyzed Loop Carried Dependence cannot be parallelizedthe thread level speculative execution scheme has been known to speedup such a loop. In this paper, we propose the scheme to apply the speculative execution alternatively only to the portion expected to be speeduped effectively, using the overhead parameter required for the book-keeping p...

    CiNii

  • A Fast Learning Method for Reinforcement Learning on Shared Memory Multiprocessors

    MORI Kouichirou, YAMANA Hayato

    IPSJ SIG Notes. ICS   2004 ( 29 ) 89 - 94  2004.03

     View Summary

    In Reinforcement Learning, the agent learns by trial and error from a state without knowledge. Therefore, reinforcement learning has drawbacks that learning is slow. It is a serious problem how learns at high speed. In order to learn at high speed, some methods have been proposed. In the methods, the value function is divided. Then each divided value function is assigned to each processor, and updated in parallel. However, the method needs to exchange experiences frequently between the divided value function because of the character of reinforcement learning. It was a problem in the previou...

    CiNii

  • Parallelization of Loops Including Loop Carried Dependences Using Thread Level Speculative Execution

    Ishikawa Shunsuke, Yamana Hayato

    IEICE technical report. Computer systems   103 ( 736 ) 19 - 24  2004.03

     View Summary

    The loop parallelization scheme improves the program performance, dramatically. However, when the data dependence cannot be analyzed statically, the conventional parallelization scheme assumes that the data dependence exists. Thus the loop containing unanalyzed Loop Carried Dependence cannot be parallelizedthe thread level speculative execution scheme has been known to speedup such a loop. In this paper, we propose the scheme to apply the speculative execution alternatively only to the portion expected to be speeduped effectively, using the overhead parameter required for the book-keeping p...

    CiNii J-GLOBAL

  • A Fast Learning Method for Reinforcement Learning on Shared Memory Multiprocessors

    MORI Kouichirou, YAMANA Hayato

    IEICE technical report. Artificial intelligence and knowledge-based processing   103 ( 725 ) 59 - 64  2004.03

     View Summary

    In Reinforcement Learning, the agent learns by trial and error from a state without knowledge. Therefore, reinforcement learning has drawbacks that learning is slow. It is a serious problem how learns at high speed. In order to learn at high speed, some methods have been proposed. In the methods, the value function is divided. Then each divided value function is assigned to each processor, and updated in parallel. However, the method needs to exchange experiences frequently between the divided value function because of the character of reinforcement learning. It was a problem in the previou...

    CiNii J-GLOBAL

  • MPIETE : An Execution Time Estimator for MPI Programs

    HORII Hiroshi, IWABUCHI Toshihiro, YAMANA Hayato

    IPSJ SIG Notes   2004 ( 20 ) 55 - 60  2004.03

     View Summary

    In this paper, we propose the MPI Execution Time Estimator (MPIETE), the execution time estimation tool for MPI programs, helping you to choose the best suited computing platform to execute a MPI program. Conventional execution time estimation schemes are not able to model a computing platform or a MPI program perfectly, which results in no reusable of any parameters of both the computing platform and the MPI program. On the contrary, the proposed scheme enables to reuse all the parameters of both the computing platform and the MPI program even for the estimation on another computing platfo...

    CiNii J-GLOBAL

  • The Branch Predictor refering a BTB Entry Existence

    SAITO FUMIKO, YAMANA HAYATO

    IPSJ SIG Notes   2004 ( 20 ) 127 - 132  2004.03

     View Summary

    The branch prediction is installed on the recent procesor to avoid stalling pipeline. Branch prediction is a kind of speculative execution for control dependence. In the recent year, the deeper pipeline gets, the higher branch miss prediction penalty reaches. Thus, branch miss prediction rate must lower to rise processor performance. The branch prediction predicts a branch direction and a branch target address. BTB (Branch Target Buffer) registers Taken branch. We found that the most branches, which do not have BTB entry are NotTaken branches. We propose the branch predictor reffering a BTB...

    CiNii J-GLOBAL

  • 見たいサイトが一発で出てくる検索エンジンの仕組みとは

    山名早人

    インターネットマガジン(インプレス)   ( No.108 ) 88 - 91  2004

  • 検索エンジンのアーキテクチャ

    山名早人

    情報の科学と技術   Vol.54 ( No.2 ) 84 - 89  2004

    DOI

  • 分岐方向偏向強弱毎の予測表で構成された分岐方向予測機構

    斎藤史子, 山名早人

    情処研報(ARC)   Vol.2004 ( No.20 ) 127 - 132  2004

  • 繰り返し囚人のジレンマゲームを適用したネットオークションモデルの提案と協調行動の観察

    久野木 彩子, 山名 早人

    DEWS2004    2004

  • 強化学習並列化による学習の高速化

    森紘一郎, 山名早人

    情処研報(ICS)   Vol.2004 ( No.29 ) 89 - 94  2004

  • リンク構造を利用したWebページの更新判別手法

    熊谷 英樹, 山名 早人

    DEWS2004    2004

  • ユーザへの応答時間を重視した最頻出kパターン抽出アルゴリズム

    平手 勇宇, 岩橋 永悟, 山名 早人

    DEWS2004    2004

  • ユーザの感覚を考慮したWeb検索システムの評価手法

    大塚 崇志, 江口 浩二, 山名 早人

    DEWS2004    2004

  • ページ-コミュニティ間の関連性を考慮したWebコミュニティ抽出

    斉田直幸, 梅沢 晃, 山名早人

    第66回情処全大   1U-5 ( 3 )  2004

    J-GLOBAL

  • トランスポート層の情報を利用したパケットの経路選択

    高見進太郎, 山名早人, 廣津登志夫

    第66回情処全大   4W-2 ( 3 )  2004

    J-GLOBAL

  • スレッドレベル投機的実行による依存距離不定運搬依存をもつループの並列化

    石川隼輔, 山名早人

    情処研報(SLDM)   Vol.2004 ( No.33 ) 63 - 68  2004

  • グループ化されたWebページを用いた検索

    梅沢 晃, 山名 早人

    DEWS2004    2004

  • MPIプログラムの簡易実行結果を用いた実行時間予測ツールMPIETEの評価

    堀井 洋, 岩渕寿寛, 山名早人

    情処研報(HPC)   Vol.2004 ( No.20 ) 55 - 60  2004

  • Cutting down the amount of communications for frequent pattern mining on Grid

    加藤真, 平手勇宇, 岩橋永悟, 山名早人

    情報処理学会シンポジウム論文集   2004 ( 6 ) 165 - 166  2004

    J-GLOBAL

  • The Branch Predictor refering a BTB Entry Existence

    斎藤史子, 山名早人

    情報処理学会シンポジウム論文集   2004 ( 6 ) 261 - 268  2004

    J-GLOBAL

  • An Efficient Algorithm for Mining Top-k Frequent Patterns with a Small Response Time

    平手勇宇, 岩橋永悟, 山名早人

    2004 CORS/INFORMS International Meeting (2004.5)    2004

  • A Challenge to Gather 10 billion of Web Pages

    山名早人

    2004 CORS/INFORMS International Meeting (2004.5)    2004

  • サービス指向コンピューティング

    山名早人監訳, 石川隼輔, 堀井洋, 岩渕寿寛, 岩橋永悟, 山口正男訳

    CACM日本語版   Vol.4 ( No.3 )  2004

  • 検索エンジンを使った翻訳サポートシステムの構築

    大鹿広憲, 佐藤学, 安藤進, 山名早人

    DBWS2004    2004

  • ハイパースレッディング環境における投機的スレッドを用いたキャッシュ効率化

    本田大, 斎藤史子, 山名早人

    SWoPP2004    2004

  • extension of group-to-group sequence alignment algorithm under a piecewise linear gap cost

    山田真介, 後藤修, 山名早人

    Proc. of Intelligent Systems for Molecular Biology 2004    2004

  • The Branch Predictor Referring a BTB Entry Existence

    SAITO FUMIKO, YAMANA HAYATO

      45 ( 7 ) 71 - 79  2004

     View Summary

    The branch prediction is installed on the recent processor to avoid stalling pipeline. Branch prediction is a kind of speculative execution for control dependence. In the recent year, the deeper pipeline gets, the higher branch miss prediction penalty reaches. Thus, branch miss prediction rate must lower to rise processor performance. The branch prediction predicts a branch direction and a branch target address. BTB (Branch Target Buffer) registers Taken branch. We found that the most branches, which do not have BTB entry are NotTaken branches. We propose the branch predictor reffering a BTB entry existence. The proposed predictor only undates the entry of the branch whose target address is registered in BTB, in order to allevilate aliasing. In SPECint95 (train), branch prediction miss rate lowers avarage 1.5% on 8 KB Gshare predictor and avarage 0.4% on 1.5 KB Bi-Mode predictor.

    CiNii

  • TF2P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

    平手勇宇, 岩橋永悟, 山名早人

    IEEE ICDM'04 Workshop on Alternatives Techniques for Data Mining and Knowledge Discovery    2004

  • Extension of Prrn: implementation of a doubly nested randomized iterative refinement strategy under a piecewise linear gap cost

    山田真介, 後藤修, 山名早人

    the Fifteenth International Conference on Genome Informatics    2004

  • Exploitation of Informational Applications - Toward the Global Web Information Archive

    Hayato Yamana

    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers   57 ( 12 ) 1632 - 1637  2003.12

    Book review, literature introduction, etc.  

    DOI

  • Branch Direction Miss Prediction Rate Diminution in Cooperation with a Selector and Predictors on Hybrid Branch Predictor

    Saito Fumiko, Nakazawa Yukari, Yamana Hayato

    IPSJ SIG Notes   2003 ( 84 ) 115 - 120  2003.08

     View Summary

    In recent years, as pipelines length deeper, more accurate branch direction predictors are needed. Most of the branch predictor tend to increase hardware budget for aliasing alleviation. This research proposes the means for miss prediction rate reduction in same hardware budget Hybrid Predictor. This predictor is called Hybrid Predictor Referenced Prediction Counter State(Hybrid-RPCS). Generally, low-predictability branches have high transition rate and no direction deviation. For low-predictability branches, prediction is turned the other way, based on a selector counter state and predicto...

    CiNii J-GLOBAL

  • Evaluation of Execution-time Prediction Method of MPI Programs based Simple Execution

    IWABUCHI Toshihiro, HORII Hiroshi, YAMANA Hayato

    IPSJ SIG Notes   2003 ( 83 ) 131 - 136  2003.08

     View Summary

    In this paper, we show evaluation results of our execution-time prediction method which simply executes MPI program on 2PU. We predict the execution time of NAS Parallel Benchmarks ver.2.3 on 2-128PU. Execution time prediction is effective technique to determine the optimal number of PU for some target applications. The most existing methods are not only for predicting execution-time but for obtaining information of various overhead, and hence need the long simulation time. On the other hand, since our purpose is to obtain execution time only, our method can predict faster than actual execu...

    CiNii J-GLOBAL

  • Parallel FP-growth Algorithm for Frequent Pattern Mining

    IWAHASHI Eigo, YAMANA Hayato

    IPSJ SIG Notes   2003 ( 71 ) 327 - 333  2003.07

     View Summary

    Frequent patterns mining is one of the important problem in data mining research. The Apriori is a prominent algorithm followed by many variants. In 2000, the FP-growth, which is reported to be faster than the Apriori, was proposed. However, many parallel algorithms of frequent pattern mining are still based on the Apriori. In this paper, we propose a parallelized version of the FP-growth, which accesses disks in parallel and constructs local FP-trees on each local memory. As a result of the evaluation using 32 node PC cluster, our method is approximately 2 and 130 times faster than sequent...

    CiNii

  • Parallel FP-growth Algorithm for Frequent Pattern Mining

    IWAHASHI Eigo, YAMANA Hayato

    IEICE technical report. Data engineering   103 ( 191 ) 109 - 114  2003.07

     View Summary

    Frequent patterns mining is one of the important problem in data mining research. The Apriori is a prominent algorithm followed by many variants. In 2000, the FP-growth, which is reported to be faster than the Apriori, was proposed. However, many parallel algorithms of frequent pattern mining are still based on the Apriori. In this paper, we propose a parallelized version of the FP-growth, which accesses disks in parallel and constructs local FP-trees on each local memory. As a result of the evaluation using 32 node PC cluster, our method is approximately 2 and 130 times faster than sequent...

    CiNii J-GLOBAL

  • 分子系統樹構成法に関する最新技術動向

    益子理絵, 山田真介, 山名早人

    第65回情処全大   1Y-5 ( 1 ) 1  2003

    J-GLOBAL

  • 分岐命令実行回数に着目した投機的実行支援情報収集機構の設計とFPGAへの実装

    蛭田智則, 山名早人, 佐谷野健二, 小池汎平

    第65回情処全大   3ZA-5   1  2003

  • 投機的実行による難並列化ループの高速化

    石川隼輔, 山名早人

    第65回情処全大   3ZA-4 ( 1 ) 1  2003

    J-GLOBAL

  • 投機的データプリフェッチを用いたキャッシュ効率化の考察

    本田 大, 山名早人

    第65回情処全大   3ZA-7 ( 1 ) 1  2003

    J-GLOBAL

  • 大規模Webデータからのコミュニティ抽出

    梅沢 晃, 山名早人

    第65回情処全大   4U-1 ( 3 ) 3  2003

    J-GLOBAL

  • リンク構造を用いたWebページ自動分類の精度向上法

    大西高裕, 山名早人

    第65回情処全大   4ZA-1 ( 3 ) 3  2003

    J-GLOBAL

  • ユーザの検索履歴を用いた情報検索システムの提案

    三浦典之, 山名早人

    第65回情処全大   3U-1 ( 3 ) 3  2003

    J-GLOBAL

  • マルコフモデルを用いたWebランキング法の評価実験

    赤津秀之, 山名早人

    第65回情処全大   4ZA-2   3  2003

  • データ依存命令を対象としたデータ値予測

    仲沢由香里, 山名早人

    第65回情処全大   3ZA-6 ( 1 ) 1  2003

    J-GLOBAL

  • ゲノムデータベースにおけるアノテーションフィールドを利用したエントリの類似検索

    三村 徹, 諸岡慎士, 山名早人

    第65回情処全大   4U-3   3  2003

  • アプリケーションのレスポンス時間を用いたPCの性能評価

    堀井 洋, 山名早人

    第65回情処全大   5U-5 ( 1 ) 1  2003

    J-GLOBAL

  • Webページ構造を考慮したキーワードによる画像の内容特定

    大鹿広憲, 山名早人

    第65回情処全大   3N-1 ( 3 ) 3  2003

    J-GLOBAL

  • Webページの更新傾向を踏まえた効率的な収集方法の提案

    熊谷英樹, 山名早人

    第65回情処全大   4ZA-4 ( 3 ) 3  2003

    J-GLOBAL

  • Webサーチエンジンの新しい評価手法

    大塚崇志, 山名早人

    電子情報通信学会第14回データ工学ワークショップDEWS2003   (7-P:Webサーチ,Web応用)  2003

  • MPIプログラムの簡易実行による実行時間予測

    岩渕寿寛, 堀井 洋, 山名早人

    第65回情処全大   5Z-5 ( 1 ) 1  2003

    J-GLOBAL

  • GnutellaにおけるQuery Hitを用いたトラヒック量軽減手法の提案

    難波貞暁, 山名早人

    第65回情処全大   5W-5   3  2003

  • IT社会を先導するインターネット-家庭でのインターネットアクセスの現状と今後-

    山名早人

    電子情報通信学会誌   Vol.86 ( No.5 ) 304 - 310  2003

  • FP-growthの並列化による頻出パターン抽出の高速化

    岩橋永悟, 山名早人

    情処研報(DBS)   Vol.2003 ( No.71 ) 327 - 334  2003

  • ハイブリッド予測機構における選択器と予測器の協調による予測ミス率の低減

    斎藤史子, 仲沢由香里, 山名早人

    情処研報(ARC)   Vol.2003 ( No.84 ) 115 - 120  2003

  • ハイブリッド予測機構における選択器と予測器の協調による予測ミス率の低減

    斎藤史子, 仲沢由香里, 山名早人

    情処研報(ARC)   Vol.2003 ( No.84 ) 115 - 120  2003

  • MPIプログラムの簡易実行による実行時間予測の評価

    岩渕寿寛, 堀井洋, 山名早人

    情処研報(HPC)   Vol.2003 ( No.83 ) 131 - 136  2003

  • 「情報」応用の開拓~全世界のWeb情報アーカイブ構築への挑戦~

    山名早人

    映像情報メディア学会誌   Vol.57 ( No.12 ) 1632 - 1637  2003

    DOI

  • 分岐命令に着目した投機的実行支援情報収集機構の設計とFPGAへの実装

    蛭田智則, 小池汎平, 佐谷野健二, 山名早人

    情報処理学会全国大会講演論文集   65th ( 1 )  2003

    J-GLOBAL

  • マルコフモデルを使用したWebランキングの評価実験

    赤津秀之, 山名早人

    情報処理学会全国大会講演論文集   65th ( 3 )  2003

    J-GLOBAL

  • P2P方式における検索効率の改善手法の評価

    難波貞暁, 山名早人

    情報処理学会全国大会講演論文集   65th ( 3 )  2003

    J-GLOBAL

  • ゲノムデータベースにおけるエントリの関連性検索

    三村徹, 諸岡慎士, 山名早人

    情報処理学会全国大会講演論文集   65th ( 3 )  2003

    J-GLOBAL

  • Hybrid Branch Predictors Evaluation on Prediction Accuracy

    Saito Fumiko, Kitamura Takeshi, Yamana Hayato

    IPSJ SIG Notes   2002 ( 112 ) 89 - 94  2002.11

     View Summary

    In recent years, branch predictors with multiple prediction tables, which are called &quot;Hybrid Predictor&quot; in this paper, have been proposed. &quot;Hybrid Predictor&quot; is classfied into two categories. One is &quot;Combining Predictor&quot;, the other is &quot;De-Aliased Predictor&quot;. The difference between &quot;Combining Predictor&quot; and &quot;De-Aliased Predictor&quot; is a means to select a prediction table. &quot;Combining Predictor&quot; select a prediction table by confidence. &quot;De-Aliased Predictor&quot; select a prediction table by branch direction bias. Although the prediction accuracy in &quot;Combining Predictor&quot; is the highest, &quot;Combining Pr...

    CiNii J-GLOBAL

  • Necessity for Confidence in Multiple PHT Branch Predictors

    Saito Fumiko, Hiruta Tomonori, Yamana Hayato

    IPSJ SIG Notes   2002 ( 81 ) 55 - 60  2002.08

     View Summary

    In recent years, branch predictors with multiple PHTs(Pattern History Table) has been proposed. In this paper, branch predictors with multiple PHTs are classfied into two categories. One is &quot;Hybrid Predictor&quot;, the other is &quot;Multiple PHT Predictor&quot; (.which is called in this paper). The difference between &quot;Hybrid Predictor&quot; and &quot;Multiple PHT Predictor&quot; is PHT selection confidence. In this paper, we compare &quot;Hybrid Predictor&quot; with &quot;Multiple PHT Predictor&quot;. As the result, if &quot;Hibrid Predictor&quot; is the same size as &quot;Multiple PHT Predictor&quot;,&quot; Hybrid Predictor&quot; predicts branch directions better tha...

    CiNii J-GLOBAL

  • A Proposal of the Branch Prediction Technique based on the Transition Rate

    Umezawa Akira, Yamana Hayato

    IPSJ SIG Notes   2002 ( 37 ) 25 - 30  2002.05

     View Summary

    In order to raise the processing speed of a processor, in today&#039;s processor, the technique of piplining is adapted to extract the instruction level parallelism. However, a pipeline stall occurs when a conditional branch exists. Various researches have been done, in order to raise the accuracy of prediction. In this paper, we propose a new branch prediction technique based on the transition rate, which is specifically the number of succession branch times for the same direction. The proposed scheme targets the branches that are classified into difficult prediction branch. We applied the prop...

    CiNii J-GLOBAL

  • Search Pattern Modeling based on its Search Interval

    Suzuki Shunsuke, Yamana Hayato

    IPSJ SIG Notes   Vol.2002 ( No.28 ) 103 - 110  2002.03

     View Summary

    The conventional search engines searches based on the pre-generated index. Thus, when some users search with the same query, the search engine returns same result, even if they want to obtain different results. In order to solve such a problem, in this paper, we propose the user modeling scheme based on the user&#039;s search pattern to speculate the user&#039;s intention. Consequently, we have classified the search interval to re-search into two patterns. Furthermore, we have classified 91% of a user&#039;s queries into nine patterns. Using these patterns, the search engines will be able to return the results that suite the user&#039;s intention.

    CiNii

  • Search Pattern Modeling based on its Search Interval

    Suzuki Shunsuke, Yamana Hayato

    IPSJ SIG Notes   2002 ( 28 ) 103 - 110  2002.03

     View Summary

    The conventional search engines searches based on the pre-generated index. Thus, when some users search with the same query, the search engine returns same result, even if they want to obtain different results. In order to solve such a problem, in this paper, we propose the user modeling scheme based on the user&#039;s search pattern to speculate the user&#039;s intention. Consequently, we have classified the search interval to re-search into two patterns. Furthermore, we have classified 91% of a user&#039;s queries into nine patterns. Using these patterns, the search engines will be able to return the re...

    CiNii J-GLOBAL

  • An Efficient Speculative Execution Scheme for Loops

    Ishikawa Shunsuke, Yamana Hayato

    IPSJ SIG Notes   Vol.2002 ( No.22 ) 121 - 126  2002.03

     View Summary

    In this paper, we propose an efficient speculative execution scheme for loops, and have confirmed the usefullness of the scheme using the compress program from SPECcpu95 benchmark. Generally, since the execution time of loops holds the large portion of the total execution time, the loop parallelization scheme improves the program performance, dramatically. However, when the data dependence cannot be analyzed statically, the conventional parallelization scheme assumes that the data dependence exists. For this reason, such a loop cannot be parallelized even if the loop carried dependence(LCD) occurs only once in 10,000 times, dynamically. However, the speculative execution scheme has been known to speedup such a loop. In this paper, we propose the scheme to apply the speculative execution alternatively only to the portion expected to be speeduped effectively, using the overhead parameter required for the book-keeping process when the speculation fails. Such overhead has not been considered on oonventional speculative execution schemes. The proposed scheme enables the alternative speculative execution using the overhead parameter for book-keeping, the LCD existence probability, and the timing of the speculative execution initiation. As a result, in the present stage, the execution speed is fell down to one third. To solve this problem, we also propose a new speculative execution.

    CiNii

  • An Efficient Speculative Execution Scheme for Loops

    Ishikawa Shunsuke, Yamana Hayato

    IPSJ SIG Notes   2002 ( 22 ) 121 - 126  2002.03

     View Summary

    In this paper, we propose an efficient speculative execution scheme for loops, and have confirmed the usefullness of the scheme using the compress program from SPECcpu95 benchmark. Generally, since the execution time of loops holds the large portion of the total execution time, the loop parallelization scheme improves the program performance, dramatically. However, when the data dependence cannot be analyzed statically, the conventional parallelization scheme assumes that the data dependence exists. For this reason, such a loop cannot be parallelized even if the loop carried dependence(LCD)...

    CiNii J-GLOBAL

  • 脳型情報処理の研究に関する最新動向

    齋藤雅浩, 山名早人

    第64回情処全大   5P-3 ( 2 )  2002

    J-GLOBAL

  • 逆リンクのチェックによるサイトの特徴・有用性の調査

    高見進太郎, 山名早人

    第64回情処全大   3X-3 ( 3 )  2002

    J-GLOBAL

  • マルコフモデルを使用したWebランキング

    赤津秀之, 山名早人

    第64回情処全大   3X-6 ( 3 )  2002

    J-GLOBAL

  • ドメイン毎のWebページ数の偏りを考慮した日本のWebページ数推定調査

    西村真幸, 山名早人

    第64回情処全大   2X-6 ( 3 )  2002

    J-GLOBAL

  • Web上からの論文ファイル自動抽出の試み

    田伏真之, 山名早人

    第64回情処全大   4Y-6 ( 3 )  2002

    J-GLOBAL

  • Webページの更新頻度とアクセス頻度に基づく効率的な収集方法の考察

    熊谷英樹, 山名早人

    第64回情処全大   4X-6 ( 3 )  2002

    J-GLOBAL

  • 構造プロファイルによる局所構造予測法の開発

    山田真介, 富井健太郎, 太田元規, 秋山泰, 山名早人

    第2回日本蛋白質科学会年会ポスター   2p-141  2002

  • The Latest Technical Trends in Speculative Execution

    Saito Fumiko, Yamana Hayato

    IPSJ SIG Notes   2001 ( 116 ) 67 - 72  2001.11

     View Summary

    Instruction level speculative execution schemes are classified into the branch prediction which alleviates control dependence, and the data prediction which alleviates data dependece. In this paper, we summarize 36 papers on the branch prediction and 27 papers on the data prediction in HPCA from 1996 to 2001, ISCA, MICRO, and ASPLOS from 1996 to 2000. As the general trends, until 1998, more than half of the researches on speculative execution are related to the branch prediction. However, since 1997, reseraches on data prediction have increased.

    CiNii J-GLOBAL

  • An Estimation Scheme of the Exection Time for MPI Programs using Measured Primitives

    Horii Hiroshi, Yamana Hayato

    IPSJ SIG Notes   2001 ( 102 ) 61 - 66  2001.10

     View Summary

    In this paper, we propose the scheme of estimating the execution time of MPI programs, and confirmed the usefulness of the scheme using NAS Parallel Benchmarks (NPB) ver 2.3. The scheme estimates the execution time of MPI program dividing into the computation part and the communication part. In estimating the execution time of the computation, we divide a MPI program into blocks that have loop structure, measure the execution time of every block, and estimate the total execution time. In estimating the communication time, we measure the communication time with the same message size which is...

    CiNii J-GLOBAL

  • Search Engine Google

    YAMANA Hayato, KONDO Hidekazu

    Journal of Information Processing Society of Japan   Vol.42 ( No.8 ) 775 - 780  2001.08

     View Summary

    Googleは,世界最大の情報を持つサーチエンジンとして有名である.Googleは,スタンフォード大学コンピュータサイエンス学科の研究プロジェクトとしてスタートした後,シリコンバレーの2大ベンチャーキャピタルから総額2 500万ドルの投資を受け,博士課程の学生であった当時25歳のLarry(Lawrence)Pageと Sergey Brinの2人が1998年9月に会社として起業した.

    CiNii J-GLOBAL

  • 招待講演2 サーチエンジンGoogleの情報検索技術 (AIシンポジウム(第15回)WWW情報検索と情報統合)

    山名 早人

    AIシンポジウム   15   21 - 26  2001.07

    CiNii

  • 投機的実行のループへの効果的な適用法

    山名早人

    情報処理学会第62回全国大会   5R-4 ( 1 )  2001

    J-GLOBAL

  • 招待論文-サーチエンジンGoogleの情報検索技術

    山名早人

    第15回AIシンポジウム   SIG-J-A101   21 - 26  2001

    J-GLOBAL

  • データベース最前線-12-検索エンジンと高速ページ収集技術--分散型WWWロボット実験

    山名 早人

    Bit   32 ( 12 ) 72 - 79  2000.12

    CiNii J-GLOBAL

  • 2000-ARC-139-28 Unlimited Speculative Execution for Loops

    YAMANA Hayato, Koike Hanpei

    IPSJ SIG Notes   2000 ( 74 ) 163 - 168  2000.08

     View Summary

    This paper discusses how to adopt the&quot;Unlimited Speculative Execution&quot;on loops. A task level speculative execution scheme, called the&quot;Unlimited Speculative Execution&quot;, is adopted on the loops that are not able to be parallelized because of memory ambiguation or control dependences. In this paper, loops are classified into nine categories to make clear the applicable loops for the scheme. Moreover, we discusses the result after applying the scheme to SPEC95int compress program.

    CiNii J-GLOBAL

  • 分散型WWWロボットによる国内のWWWデータ収集実験

    山名早人

    ACM SIGMOD Japanシンポジウム講演集    2000

  • 広域分散コンピューティングの現状と課題―分散型WWWロボットを例にとって―

    山名早人

    北海道地域ネットワーク協議会シンポジウム2000/北海道地域ネットワーク協議会     95 - 102  2000

  • スーパーコンパイラ・テクノロジの調査研究

    平成11年度先導調査研究報告書/新エネルギー・産業総合開発機構    2000

  • 分散型WWWロボットによる国内のWWWデータ収集実験

    山名早人

    ACM SIGMOD Japanシンポジウム講演集    2000

  • 臨界投機実行のループへの適用

    山名早人, 小池汎平

    情報処理学会 計算機アーキテクチャ研究会(SWoPP00)    2000

  • 分散型WWWロボットの予備評価と高速化の検討

    山名早人, 森英雄, 田村健人, 河野浩之, 村岡洋一

    日本ソフトウェア科学会インターネットテクノロジワークショップ    2000

  • Internet広域分散サーチロボットの研究開発

    村岡洋一, 山名早人, 田村健人, 河野浩之, 森英雄, 浅井勇夫, 西村英樹, 楠本博之, 篠田洋一

    第19回IPA技術発表会    2000

  • 分散WWWロボット実験

    山名早人

    Bit,共立出版    2000

  • 分散型WWWロボット実験の状況 (特集 次世代インターネットの展望)

    山名 早人

    機械振興   32 ( 8 ) 61 - 67  1999.08

    CiNii

  • User Support on Narrowing Retrieval using the Unlimited Speculativ Search Service.

    山名早人, 小池汎平, 児玉祐悦, 坂根広史, 山口喜教

    人工知能学会知識ベースシステム研究会資料   43rd ( 43 ) 93 - 98  1999.03

    CiNii J-GLOBAL

  • Speculative Control/Data Dependence Graph and Java Jog-time Analyzer : A Preliminary Evaluation for Java Virtual Accelerator

    KOIKE HANPEI, YAMANA HAYATO, YAMAGUCHI YOSHINORI

      40 ( 1 ) 32 - 41  1999.02

     View Summary

    The authors are investigating the possibility of Java Virtual Accelerator, a run-time parallelizing interpreter/JIT compiler system which speeds up Java execution through parallelizing emulation. To realize parallelizing emulation, automatic extraction of the parallelism from sequential binary programs is important. We developed the "speculative control-data dependence graph" model to relieve the control and data dependence constraints inherent in the sequential programs. Speculative control-data dependence graph is constructed by measuring the prediction rate for both control and data values during the test run, and replacing highly predictable arcs with predict-confirm nodes. Java Jog-time Analyzer is developed for the experiment of the model described above. JJA analyzes control and data dependences statically while class files are loaded, and the intermediate code interpreter of JJA invokes data and branch prediction modules and gathers run-time statistics everytime basic block boundary is crossed. Run-time statistics such as the block execution counts, the prediction rates, the critical path execution time and the average parallelism, as well as the plot of the dependence graphs, are shown at the end of the execution. In this paper, several experiment results with JJA are shown.

    CiNii

  • Speculative Control/Data Dependence Graph and Java Jog-time Analyzer : A Preliminary Evaluation for Java Virtual Accelerator

    KOIKE HANPEI, YAMANA HAYATO, YAMAGUCHI YOSHINORI

    Transactions of Information Processing Society of Japan   40 ( 1 ) 32 - 41  1999.02

     View Summary

    The authors are investigating the possibility of Java Virtual Accelerator, a run-time parallelizing interpreter/JIT compiler system which speeds up Java execution through parallelizing emulation. To realize parallelizing emulation, automatic extraction of the parallelism from sequential binary programs is important. We developed the &quot;speculative control-data dependence graph&quot; model to relieve the control and data dependence constraints inherent in the sequential programs. Speculative control-data dependence graph is constructed by measuring the prediction rate for both control and data valu...

    CiNii J-GLOBAL

  • Evaluation of Communication Mechanisms for Distributed Memory Parallel Computers in Wavefront Computation

    SAKANE Hirofumi, KODAMA Yuetsu, TATEBE Osamu, KOIKE Hanpei, YAMANA Hayato, YAMAGUCHI Yoshinori, YUBA Yoshitsugu

    Transactions of Information Processing Society of Japan   40 ( 5 ) 2281 - 2292  1999

     View Summary

    In this paper, we discuss efficient parallel execution of a dense-matrix problem considering trade-offs between fine-grain and coarse-grain communication in distributed memory machines. The solution of the triangular system of equations involves data dependencies between consecutive iterations in the outer-loop. The dependencies can be naturally solved and processed in parallel by wavefront computation. Two ways of parallelizing are presented; the element-wise fine-grain approach and the coarse-grain approach. We implemented these algorithms on both EM-X and AP 1000+. Fine-grain support mechanisms of the EM-X had a great effect on the performance of the element-wise method for relatively small problem size, while employed RISC processors of the AP1000+ brought high performance of the coarse-grain method for larger size.

    CiNii

  • Preliminal Evaluation of the Unlimited Speculative Search Service on Parallel Computers.

    山名早人, 小池汎平, 児玉祐悦, 坂根広史, 山口喜教

    情報処理学会シンポジウム論文集   99 ( 6 ) 216  1999

    J-GLOBAL

  • Distributed WWW robot experiment.

    山名早人

    機械振興   32 ( 8 ) 61 - 67  1999

    J-GLOBAL

  • 経営学大事典 第二版

    中央経済社    1999

  • Internet広域分散協調サーチロボットの研究開発

    IPA第18回技術発表会論文集/情報処理振興事業協会   18   71 - 78  1999

  • 分散型WWWロボットの実験状況と今後の課題

    インターネットコンファレンス99論文集/日本ソフトウェア科学会   14, p.141  1999

  • Design of Automatic Parallelizing Intermediate Code Interpreter

    KOIKE HANPEI, YAMANA HAYATO, YAMAGUCHI YOSHINORI

      40 ( SIG10 ) 64 - 74  1999

     View Summary

    In this paper, the design of the intermediate code interpreter, which executes a sequential program in parallel using speculative method, is discussed. Software techniques which enable an efficient parallel speculative execution without hardware support, such as the check point execution mechanism with which an appropriate parallel execution granularity is established, and the efficient implementation of the speculative memory operations which minimize the overhead of searching, recording and the mutual exclusion, are proposed. Experiment results to see the basic performance of these techniques are also presented. From the experiment, we confirmed that we can implement a speculative intermediate code interpreter which can result in speedup, if we adopt the soft ware techniques described in this paper.

    CiNii

  • 国内の全WWWデータを24時間で収集する分散型WWWロボットの試み

    山名早人, 田村健人, 森英雄, 黒田洋介, 西村英樹, 浅井勇夫, 楠本博之, 篠田陽一, 村岡洋一

    Proceedings of NORTH Internet Symposium   1999  1999

    J-GLOBAL

  • Fast speculative search engine on the highly parallel computer EM-X

    Hayato Yamana, Hanpei Koike, Yuetsu Kodama, Hirofumi Sakane, Yoshinori Yamaguchi

    SIGIR Forum (ACM Special Interest Group on Information Retrieval)     390  1998.12

     View Summary

    A WWW search engine called fast speculative search engine that uses speculative execution of multiprocessor systems to shorten the total time to retrieve information from the WWW is presented. This engine predicts the user&#039;s next queries and initiates the searches with the predicted queries before receiving them to accelerate narrowing the search space. This fast speculative search engine is implemented using the data speculation on the EM-X, a highly parallel computer which can tolerate communication latency by using low latency communication and multithreading.

    DOI

  • New trends of information retrieval in multimedia and Internet. Globally distributed cooperative search robot for Internet.

    山名早人

    Computer Today   15 ( 5 ) 4 - 9  1998.09

    CiNii J-GLOBAL

  • A Study of Adopting the unlimited Speculative Execution on Multigrain Parallelizing Compilers

    YAMANA Hayato, KOIKE Hanpei, KODAMA Yuetsu, SAKANE Hirohumi, YAMAGUCHI Yoshinori

    IPSJ SIG Notes   98 ( 70 ) 19 - 24  1998.08

     View Summary

    This paper discusses the effectiveness of the unlimited speculative execution and how to adopt the scheme on multigrain parallelizing compilers.The multigrain parallelizing compilers exploit parallelism among coarse-grain taks like loops, medium-grain tasks such as loop iterations, and near-fine-grain tasks such as statements.When we adopt the unlimited speculative execution scheme on multigrain parallelizing compilers, the codes, that are not parallelized because of memory ambiguation or control dependences, are able to be parallelized.In this paper, loops are classified into nine categori...

    CiNii J-GLOBAL

  • A Study of Unlimited Speculative Execution on Multigrain Parallel Processing

    YAMANA Hayato, KOIKE Hanpei, KODAMA Yuetsu, SAKANE Hirofumi, YAMAGUCHI Yoshinori

    全国大会講演論文集   56 ( 1 ) 297 - 298  1998.03

    CiNii J-GLOBAL

  • Speculative Control-Data Dependence Graph and Java Jog-time Analyzer-A Step toward Java Virtual Accelerator.

    小池汎平, 山名早人, 山口喜教

    情報処理学会シンポジウム論文集   98 ( 7 )  1998

    J-GLOBAL

  • 分散型ロボットによるWWW情報収集

    山名早人

    第9回データ工学ワークショップ(DEWS'98), 電子情報通信学会データ工学専門委員会    1998

    CiNii

  • A Survey of World Wide Web Search Engines

    Yamana Hayato

    コンピュータソフトウェア   14 ( 5 ) 503 - 510  1997.09

     View Summary

    1997年1月時点で,世界の約83万組織,約1600万台のコンピュータがインターネットに接続し,学術論文から趣味にいたるまで,1億ページを越える情報がWWWサーバから発信されている.この膨大な情報を有効に利用するためには,必要とする情報の掲載されたページを瞬時に,かつ,的確にみつけ出すことが必須となる.このような機能を提供するWWW情報検索サービスは,1994年頃から登場し始め,現在,その数は100を越える.本稿では,WWW情報検索サービスの現状とその問題点を解説する.

    CiNii J-GLOBAL

  • Parallel Execution of Radix Sort Programs Using Fine - grain Communication

    KODAMA Yuetsu, SAKANE Hirofumi, SATO Mitsuhisa, YAMANA Hayato, SAKAI Shuichi, YAMAGUCHI Yoshinori

    Transactions of Information Processing Society of Japan   38 ( 9 ) 1726 - 1735  1997.09

     View Summary

    EM-X is a highly parallel computer with a distributed memory. It supports fine-grain communication, whose size is two-word fixed, on an instruction execution pipeline. It achieves high communication throughput by overlapping remote memory access with thread execution, and tolerates communication latency by rapid switching of threads. We developed an 80 processor system of EM-X, and are evaluating its architectural features on the system. In this paper, we execute radix sort programs to evaluate the parallel performance of EM-X and compare the results with other parallel computers. The results show that fine grain communication achieves very good scalability, while coarse grain message passing decrease the performance on a large number of processors because of contentions on a network.

    CiNii

  • Parallelization and Performance Evaluation of Sparse Matrix Computation in The EM - X Multiprocessor

    SATO Mitsuhisa, KODAMA Yuetsu, SAKANE Hirofumi, YAMANA Hayato, SAKAI Shuichi, YAMAGUCHI Yoshinori

    Transactions of Information Processing Society of Japan   38 ( 9 ) 1761 - 1770  1997.09

     View Summary

    In this paper, we describe the parallelization of a sparse matrix computation, CG (Conjugate Gradient method) kernel taken from NAS parallel benchmark suite, for the EM-X multiprocessor. Dataflow mechanism of EM-X supports fine-grain communication very efficiently, which provides low latency communication, and flexible message-passing facility. We compare the performance of sparse matrix vector multiplications by the complete exchange communication, by element-wise remote update and by the element-wise remote read with multithreading. The measurements taken on the EM-X indicates effectiveness of the fine-grain communication which enables element-wise access efficiently. Fine-grain communcation is effective when problem size per PE becomes small in large scale multiprocessor systems. The complete complete exchange version incurs the negative impact due to the limitation of its bandwidth, and the performance of the element-wise remote read version is degraded by the overhead of context-switching for multithreading.

    CiNii

  • Parallel Execution of Radix Sort Programs Using Fine-grain Communication

    KODAMA YUETSU, SAKANE HIROFUMI, SATO MITSUHISA, YAMANA HAYATO, SAKAI SHUICHI, YAMAGUCHI YOSHINORI

    Transactions of Information Processing Society of Japan   38 ( 9 ) 1726 - 1735  1997.09

     View Summary

    EM-X is a highly parallel computer with a distributed memory. It supports fine-grain communication, whose size is two-word fixed, on an instruction execution pipeline. It achieves high communication throughput by overlapping remote memory access with thread execution, and tolerates communication latency by rapid switching of threads. We developed an 80 processor system of EM-X, and are evaluating its architectural features on the system. In this paper, we execute radix sort programs to evaluate the parallel performance of EM-X and compare the results with other parallel computers. The resul...

    CiNii J-GLOBAL

  • Parallelization and Performance Evaluation of Sparse Matrix Computation in The EM-X Multiprocessor

    SATO MITSUHISA, KODAMA YUETSU, SAKANE HIROFUMI, YAMANA HAYATO, SAKAI SHUICHI, YAMAGUCHI YOSHINORI

    Transactions of Information Processing Society of Japan   38 ( 9 ) 1761 - 1770  1997.09

     View Summary

    In this paper, we describe the parallelization of a sparse matrix computation, CG (Conjugate Gradient method) kernel taken from NAS parallel benchmark suite, for the EM-X multiprocessor. Dataflow mechanism of EM-X supports fine-grain communication very efficiently, which provides low latency communication, and flexible message-passing facility. We compare the performance of sparse matrix vector multiplications by the complete exchange communication, by element-wise remote update and by the element-wise remote read with multithreading. The measurements taken on the EM-X indicates effectivene...

    CiNii J-GLOBAL

  • Using the Unlimited Speculative Execution on WWW Information Retrieval

    YAMANA Hayato, KOIKE Hanpei, KODAMA Yuetsu, TODA Kenji, YAMAGUCHI Yoshinori

    IEICE technical report. Computer systems   97 ( 226 ) 69 - 74  1997.08

     View Summary

    This paper explains how to use the Unlimited Speculative Execution scheme to accelerate the information retrieval on the World Wide Web. The Unlimited Speculative Execution scheme utilizes low loaded processors to speculate the tasks which are not decided to be initiated. The goal of this research is to present a fast WWW information retrieval system by using the Unlimited Speculative Execution scheme. We use the EM-X parallel computer which consists of 80 processors for the platform.

    CiNii J-GLOBAL

  • Developing information industry. Trends of WWW information retrieval service.

    山名早人

    機械振興   30 ( 8 ) 54 - 63  1997.08

    CiNii J-GLOBAL

  • Attractiveness of the Internet.

    山名早人

    CIAJ Journal (Communications and Information Network Association of Japan)   37 ( 3 )  1997

    J-GLOBAL

  • Fine-grain parallel processing of a dense-matrix problem on EM-X-Efficient execution of wavefront parallelism.

    坂根広史, 児玉祐悦, 小池汎平, 佐藤三久, 山名早人, 坂井修一, 山口喜教

    並列処理シンポジウム論文集   1997  1997

    J-GLOBAL

  • Superspeed computer application technology for elucidating complicated phenomena.

    関口智嗣, 佐藤三久, 山名早人

    国立機関原子力試験研究成果報告書   36(1995)  1997

    J-GLOBAL

  • Experience with fine-grain communication in EM-X multiprocessor for parallel sparse matrix computation

    M Sato, Y Kodama, H Sakane, H Yamana, S Sakai, Y Yamaguchi

    11TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM, PROCEEDINGS     242 - 248  1997

     View Summary

    Sparse matrix problems require a communication paradigm different from those used in conventional distributed-memory multiprocessors. We present in this gaper how fine-grain communication can help obtain high performance in the experimental distributed-memory multiprocessor, EM-X, developed at ETL, which can handle fine-grain communication very efficiently. The sparse matrix: kernel, Conjugate Gradient, is selected for the experiments. Among the steps in CG is the sparse matrix vector multiplications we focus on in the study. Some communication methods are developed for performance comparison, including coarse-grain and fine-grain implementations, Fine-grain communication allows exact data access in an unstructured problem to reduce the amount of communication. While CG presents bottlenecks in terms of a large number of fine-grain remote reads, the multi-thraded principles of execution is so designed to tolerate such latency. Experimental results indicate that the performance of fine-grain read implementation is comparable to that of coarse-grain implementation on 64 processors. The results demonstrate that fine-grain communication can be a viable and efficient approach to unstructured sparse matrix problems on large-scale distributed-memory multiprocessors.

  • Fine-grain multithreading with the EM-X multiprocessor

    Andrew Sohn, Yuetsu Kodama, Jui Ku, Mitsuhisa Sato, Hirofumi Sakane, Hayato Yamana, Shuichi Sakai, Yoshinori Yamaguchi

    Annual ACM Symposium on Parallel Algorithms and Architectures     189 - 198  1997.01

     View Summary

    Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixed-sized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multithreading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FFT yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 35% of the communication time for bitonic sorting.

    DOI

  • Message-based efficient remote memory access on a highly parallel computer EM-X

    Yuetsu Kodama, Yuetsu Kodama, Hirohumi Sakane, N. Mitsuhisa Sato, Hayato Yamana, Shuichi Sakal, Yoshinori Yamaguchl

    IEICE Transactions on Information and Systems   E79-D ( 8 ) 1065 - 1071  1996.12

     View Summary

    Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The prioritybased scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.

  • Performance Evaluation for a Matrix Operation Benchmark on EM-X Multiprocessor

    SAKANE HIROFUMI, KODAMA YUETSU, SATO MITSUHISA, YAMANA HAYATO, SAKAI SHUICHI, YAMAGUCHI YOSHINORI

    IPSJ SIG Notes   96 ( 80 ) 239 - 244  1996.08

     View Summary

    In this paper, we discuss an implementation of the LINPACK benchmark parallelized on the EM-X multiprocessor and evaluate its performance focusing the floating point operations in which a regular repetitive pattern occurs. It is important to overlap the communication and calculation as much as relationship between the broadcast algorithms and load balancing. Exploiting the potential of a reduction of the number or memory accesses and adopting the multi-column simultaneous elimination technique, we also further accelerated the most innerloop code we had already reported for optimization on a...

    CiNii J-GLOBAL

  • Message-based efficient remote memory access on a highly parallel computer EM-X

    Y Kodama, H Sakane, M Sato, H Yamana, S Sakai, Y Yamaguchi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E79D ( 8 ) 1065 - 1071  1996.08

     View Summary

    Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.

    CiNii

  • Enjoy the WWW!

    YAMANA Hayato

    The Journal of the Institute of Electronics, Information, and Communication Engineers   79 ( 1 ) 65 - 67  1996.01

    CiNii J-GLOBAL

  • Parallel execution of radix sort program on a highly parallel computer EM-X.

    児玉祐悦, 坂根広史, 佐藤三久, 山名早人, 坂井修一, 山口喜教

    並列処理シンポジウム論文集   1996  1996

    J-GLOBAL

  • Application technology of superspeed computer for elucidation of complicated phenomena.

    関口智嗣, 佐藤三久, 山名早人

    国立機関原子力試験研究成果報告書   35(1994)  1996

    J-GLOBAL

  • Survey of Speculative Execution and the Effect of Task-level Speculation

    Yamana Hayato, Sato Mitsuhisa, Kodama Yuetsu, Sakane Hirohumi, Sakai Shuichi, Yamaguchi Yoshinori

    全国大会講演論文集   51 ( 6 ) 75 - 76  1995.09

     View Summary

    投機的実行(Speculative Execution)に関して,94年〜95年7月のサーベイを報告すると共に,我々が提案しているタスク間投機的実行の有効性を示す.なお,94年までの調査については,文献[2]を参照していただきたい.調査対象とした論文を表1に示し,近年の投機的実行に関する論文数の推移を図1に示す.図1に示すように,VLIWやSuperscalarが出始めた91年頃から投機的実行に関する論文が急増している.これらの研究は,(1)プログラムに内在する命令レベルの並列性調査,(2)Superscalar/VLIWでの投機的実行,(3)並列計算機での投機的実行に分類される.90年代前半は(1)に関する論文が多かったが,その後,(2)に関する論文が急増し,94-95年の論文はその中でも,分岐予測(branch prediction)と条件付実行(predicated execution)に関するものが全体の7割を占め,89-93年に多かったアーキテクチャ上の実現方法に関する論文が激減した.本報告では,現在最もホットな話題となっている分岐予測と条件付実行を中心に説明する.

    CiNii J-GLOBAL

  • Decreasing the Control Overhead of the Unlimited Speculative Execution

    Yamana Hayato, Sato Mitsuhisa, Kodama Yuetsu, Sakane Hirohumi, Sakai Shuichi, Yamaguchi Yoshinori

    IPSJ SIG Notes   95 ( 80 ) 33 - 40  1995.08

     View Summary

    This paper discusses how to decrease the control overhead of tasks with Speculation on multiprocessors. Firstly, we have implemented the unlimited speculative execution on the EM-4 multiprocessor. Secondly, the overhead is classified into its several sources. After measuring each classified overhead, it has been confirmed that both the broadcast latency and the overhead initiating tasks are not major factors. Insted, the overhead of receiving and manipulating broadcasted control data is major factor. When the factor is decreased by 1/4, the speedup ratio increases up to 3 and we will have 1...

    CiNii J-GLOBAL

  • Parallelization and Performance of Sparse Matrix Computation in The EM-X Multiprocessor

    Sato Mitsuhisa, Kodama Yuetsu, Sakane Hirofumi, Yamana Hayato, Sakai Shuichi, Yamaguti Yoshinori

    IPSJ SIG Notes   95 ( 80 ) 209 - 216  1995.08

     View Summary

    In this paper, we describe the parallelization of a sparse matrix computation, CG (Conjugate Gradient method) kernel taken from NAS parallel benchmark suite, for the EM-X multiprocessor. EM-X is a distributed-memory multiprocessor. Dataflow mechanism of EM-X supports fine-grain communication very efficiently, which provides low latency communication, and flexible message-passing with direct remote memory access. We compare the performance of sparse matrix vector multiplications by the complete exchange communication and by the element-wise remote memory access with multithreading. The measu...

    CiNii J-GLOBAL

  • A Distributed Control Scheme of Macrotask-level Speculative Execution on the EM-4 Multiprocessor

    Yamana Hayato, Sato Mitsuhisa, Kodama Yuetsu, Sakane Hirohumi, Sakai Shuichi, Yamaguchi Yoshinori

    Transactions of Information Processing Society of Japan   36 ( 7 ) 1578 - 1588  1995.07

     View Summary

    The purpose of this paper is to propose a new fast control scheme of macrotasks with speculation. A macrotask is a coarse grain task which is a unit of speculation. Previous works have reported that the speedup ratio is 12 to 630 times in comparison with conventional execution schems without speculation when both the speculation depth and the computational resource are infinite, that is called oracle model. The control overhead,however, prevents the speedup from attaining the theoretical speedup ratio. Thus, the control scheme with small overhead is desired. The distributed control scheme a...

    CiNii

  • A macrotask-level unlimited speculative execution on multiprocessors

    Hayato Yamana, Mitsuhisa Sato, Yuetsu Kodama, Hirofumi Sakane, Shunichi Sakai, Yoshinori Yamaguchi

    Proceedings of the International Conference on Supercomputing   Part F129361   328 - 337  1995.07

     View Summary

    The purpose of this paper is to propose a new fast execution scheme of FORTRAN programs. The proposed scheme enables the fast initiation of macrotask when its data dependences are satisfied even if the control flow has not been reached. The previous schemes to parallelize a program including conditional branches have a number of problems - 1) Though the theoretical speedup ratio is up to N when N conditional branches are jumped on either a VLIW or a superscalar machine, the number of N is restricted up to the number of ALU's on a chip, 2) Since conventional control schemes use a few processors to control macrotasks, the overhead to control them is large. The proposed scheme solves these problems - 1) The proposed scheme enables speculative execution between coarse grain tasks, i.e., macrotasks, on multiprocessors by jumping many conditional branches, 2) A distributed control scheme is proposed and implemented on the EM-4 multiprocessor to decrease the control overhead of macrotasks. Preliminary evaluations show that the control overhead of the proposed scheme is smaller than that of the other control schemes. Moreover, it is confirmed that the distributed control can be implemented by using software when the average macrotask execution time is larger than 14.4 (Is on the EM-4 multiprocessor.

    DOI

  • A SPECULATIVE EXECUTION SCHEME OF MACROTASKS FOR PARALLEL-PROCESSING SYSTEMS

    H YAMANA, T YASUE, Y ISHII, Y MURAOKA

    SYSTEMS AND COMPUTERS IN JAPAN   26 ( 6 ) 1 - 15  1995.06

     View Summary

    This paper considers the high-speed execution of FORTRAN programs on parallel processing systems and proposes the parallelizing scheme of the program and execution based on the speculative execution over multiple conditional branches. Several techniques have been proposed that parallelize the program including conditional branches.
    A method which does not use the speculative execution is: (1) the method called earliest execution condition determination. As the methods which use the speculative execution are: (2) speculative evaluation scheme for a single conditional branch for the superscalar processor or VLIW computer; and (3) multiple speculative execution scheme assuming particular loops.
    There are the following problems: (1) sufficient parallelism is not extracted only by determining the earliest execution condition; (2) the speed improvement that can be realized by the speculative execution of a single conditional branch is at most twofold; and (3) the scheme can be applied only to particular loops.
    This paper divides the program into macrotasks, and defines the multiple stage speculative execution scheme between macrotasks on the general parallel processing system. Then, the macrotask execution control for the individual macro-task is proposed, using the execution start condition, the control establishment condition and the execution stop condition.

    DOI

  • 分散共有メモリ型並列計算機における1重Doacross型ループの実行時間算出法

    山名,安江, 村岡,山口

    電子情報通信学会論文誌   J78-D-1/2   170 - 178  1995

  • Parallelization and Performance of Sparse Matrix Computation in The EM-X Multiprocessor.

    佐藤三久, 児玉祐悦, 坂根広史, 山名早人, 坂井修一, 山口喜教

    情報処理学会研究報告   95 ( 80(ARC-113) )  1995

    J-GLOBAL

  • The EM-X parallel computer: Architecture and basic performance

    Y KODAMA, H SAKANE, M SATO, H YAMANA, S SAKAI, Y YAMAGUCHI

    22ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS     14 - 23  1995

     View Summary

    Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It can create a packet in one cycle, and receive a packet from the network in the on-chip buffer without interruption. EM-X invokes threads on packet arrival, minimizing the overhead of thread switching. It can tolerate communication latency by using efficient multi-threading and optimizing packet flow of fine grain communication. EM-X also supports the synchronization of two operands, direct remote memory read/write operations and flexible packet scheduling with priority. This paper describes distinctive features of the EM-X architecture and reports the performance of small synthetic programs and larger more realistic programs.

    DOI

  • EM-X parallel computer: Architecture and basic performance

    Yuetsu Kodama, Hirohumi Sakane, Mitsuhisa Sato, Hayato Yamana, Shuichi Sakai, Yoshinori Yamaguchi

    Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA     14 - 23  1995.01

     View Summary

    Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It can create a packet in one cycle, and receive a packet from the network in the on-chip buffer without interruption. EM-X invokes threads on packet arrival, minimizing the overhead of thread switching. It can tolerate communication latency by using efficient multi-threading and optimizing packet flow of fine grain communication. EM-X also supports the synchronization of two operands, direct remote memory read/write operations and flexible packet scheduling with priority. This paper describes distinctive features of the EM-X architecture and reports the performance of small synthetic programs and larger more realistic programs.

  • An Evaluation of Doacross among Loops on the EM-4 Multiprocessor

    YAMANA Hayato, SATO Mitsuhisa, KODAMA Yuetsu, SAKANE Hirohumi, SAKAI Shuichi, YAMAGUCHI Yoshinori

    全国大会講演論文集   48 ( 6 ) 19 - 20  1994.03

     View Summary

    従来, Doall型以外のループを並列計算機上で実行する方式としてDoacross[Cytr86]やPipelining[PaKL80]が提案されている. しかし, これらの方式は, 元々, 密結合型の並列計算機を対象としたものであり, メッセージ通信によりプロセッサ間のデータ交換を行う祖結合型の並列計算機では, 十分な処理性能を引き出すことができない. これは, 以下に述べる問題によるものである. ここで, ループの繰り返し回数をNとする. ・ Doacrossでは, プロセッサ間の通信ディレイが(N-l)回分, 全体の実行時間に加算されるため, が十分に小さくないと処理速度の向上が得られない. ・ Pipeliningでは, 各文の実行時問Tsが(N-l)回分, 全体の来行時間に加算されるため, Tsが十分に小さくないと処理速度の向上が得られない. 祖結合型の並列計算機では, メッゼージ通信によりプロセッサ間のデータ交換を行うため,を小さくすることが困難である. また, Tsには, 他のプロセッサ間でのデータの入出力時間が含まれるため, Tsを小さくすることも困難である. これに対して, 本報告で提案するループ間Doacrossは, プロセッサ問の通信ディレイが全体の実行時間に与える影響, 及び, 各文の実行時問Tsが全体の実行時間に与える影響を小さくする方式である. 本...

    CiNii J-GLOBAL

  • Optimization of network interface in a processing element for a parallel computer EM-X

    Sakane Hirofumi, Kodama Yuetsu, Sato Mitsuhisa, Yamana Hayato, Sakai Shuichi, Yamaguchi Yoshinori

    IPSJ SIG Notes   94 ( 13 ) 105 - 112  1994.01

     View Summary

    This paper discusses some of the design parameters for the network interface of a single chip processor EMC-Y to achieve high throughput and high performance. We are currently designing the EMC-Y, a processing element of a parallel computer EM-X. The design parameters include the arbitration method in the network switch, memory access priority and the size of internal FIFOs. To optimize parallel execution performance, we have examined the parameters of the network interface by using a register transfer level simulator of the EM-X.

    CiNii J-GLOBAL

  • 並列処理システムにおけるマクロタスク間先行評価方式

    山名,安江, 石井,村岡

    電子情報通信学会論文誌   J77-D-1/5   343 - 353  1994

  • Automatic Tuning of Loop-Doacross Execution Scheme on the EM-4 Multiprocessor.

    山名早人, 佐藤三久, 児玉祐悦, 坂根広史, 坂井修一, 山口喜教

    情報処理学会研究報告   94 ( 50(ARC-106) )  1994

    J-GLOBAL

  • A Distributed Controlling Scheme of the Multistage Speculative Execution on the EM-4 Multiprosessor.

    山名早人, 佐藤三久, 児玉祐悦, 坂根広史, 坂井修一, 山口喜教

    並列処理シンポジウム論文集   1994  1994

    J-GLOBAL

  • Survey of Today’s Speculative Execution Schemes and a Proposal of Unlimited Speculative Execution Scheme.

    山名早人, 佐藤三久, 児玉祐悦, 坂根広史, 坂井修一, 山口喜教

    情報処理学会研究報告   94 ( 66(ARC-107) )  1994

    J-GLOBAL

  • Fundamental performance evaluation of a processing element EMC-Y for a parallel computer.

    坂根広史, 児玉祐悦, 佐藤三久, 山名早人, 坂井修一, 山口喜教

    情報処理学会研究報告   94 ( 66(ARC-107) )  1994

    J-GLOBAL

  • 投機的実行の現状と Unlimited Speculative Execution Scheme の提案

    山名

    情報処理学会研究報告   107   105 - 112  1994

    CiNii

  • An Experimental Evaluation of the Multi-stage Specurative Execution Scheme on the EM-4 Multiprocessor

    Yamana Hayato, Sato Mitsuhisa, Kodama Yuetsu, Sakane Hirohumi, Sakai Shuichi, Yamaguchi Yoshinori

    IPSJ SIG Notes   93 ( 72 ) 105 - 112  1993.08

     View Summary

    The purpose of this paper is to evaluate a new fast execution scheme of a program with speculative execution on the EM-4 multiprocessor. Conventional Schemes to parallelize programs including conditional branches have some problems. The multi-stage specurative execution scheme enables -(1) solving the side-efects problem, (2) decreaseing the number of processors to execute the specurative execution scheme, (3) jumping multi-stage conditional branches, (4) suiting to general multiprocessor systems. An experimental evaluation shows that the measured speedup ratio is 50% of the theoretical rat...

    CiNii J-GLOBAL

  • An Implementation of Sparse BLAS-3 on a Distributed Memory Parallel Machine PA1000

    Utino Satosi, Hagiwara Junichi, Yasue Toshiaki, Yamana Hayato, Muraoka Yoichi

    全国大会講演論文集   46 ( 1 ) 59 - 60  1993.03

     View Summary

    疎行列を対象としたBLAS-3(Basic Linear Algebra Subrou-tine Leve13)を並列化し、富士通の分散メモリ型並列計算機AP1000上に実装した。BLAS-3は、次のサブルーチンで構成されている。・行列の積(GEMM,SYMM)・対称行列に対する階数kと2kの更新(SYRK,SYRK)・三角行列との積(TRMM)・右辺が複数列で、三角行列を係数に持つ連立一次方程式(TRSM)このうちTRSM以外の5つは、行列同士の積が主演算である。そのため、実装にあたって疎行列の積の並列化方法が重要になる。密行列の積の並列化と比較して、疎行列の積の並列化においては、次の点が問題になる。1.疎行列の積C&amp;loarr;C+ABの計算においては、疎行列が圧縮されて格納されているためCの書き換えに時間がかかる。2.後巡する方法で行列をセルヘ分割した時、各セル(PE)の持つ部分行列の大きさに偏りがあるために、その格納に必要なメモリと通信量に偏りが生じる。そこで、本稿では1点を解決するための計算の実行順序について提案し、その計算順序で実行した場合における2の点を解決するための通信方法について提案する。さらに、提案した方法に基づいて実装したGEMM(非対称行列同士の積)ルーチンを用いて評価する。なお、疎行列は以下の条件で格納されているものとする。汎用的なプログラムとするた...

    CiNii J-GLOBAL

  • A Optimal Data Transfer Order of Doacross Loops

    YAMANA Hayato, MURAOKA Yoichi

    全国大会講演論文集   46 ( 6 ) 35 - 36  1993.03

     View Summary

    本報告では,DOACRCSS型ループの実行時間を最小にするデータ通信順序を求める.DOACROSS型ループ実行に関する従来の研究は,プロセッサの処理能力を表すパラメータとして演算命令の実行時間(以下,演算時間),及び,データ通信の遅延時間以下,通信遅延時間)を用いてきた.しかし,演算と通信を並列に処理できるマルチプロセッサ上で,DOACROSS型ループを実行する場合,これらのパラメータ以外に,通信ピッチを考慮いなくてはならない.通信ピッチは,プロセッサと相互結合網間のデータ入出力時間間隔である.通信ピッチがデータ通信の発生する時間間隔より大きい場合,通信が全体の実行時間の隘路となる.これは、データ通信が通信ピッチ以下の時間間隔で開始(以下,発行)できず,通信発行に遅延が生じるためである.この時,実際の実行時間は,従来の理論的な値よりも大きくなる.以下では,このような場合,データを定義順で他のプロセッサへ送らず,通信順序を変更することにより,実行智間を短縮できることを示す.

    CiNii J-GLOBAL

  • A Macro-tasking Scheme for Eager Evaluation

    YAMANA Hayato, YASUE Toshiaki, ISHII Yoshihiko, MURAOKA Yoichi

    全国大会講演論文集   46 ( 6 ) 37 - 38  1993.03

     View Summary

    従来提案しているマクロタスク間先行評価方式におけるマクロタスク構成方法について報告する.先行評価方式とは,プログラム中の条件分岐文を越えて実行を進める方式である.マクロタスク生成の目的は,(1)変数の2重定義に件う副作用間題の回避,及び(2)仮実行(投棄的実行)に必要なプロセッサ数の削減の2点である.先行評価によって生じる副作用は,先行評価中に,同一データに対する2重定義が行われることによって生じる,本稿では,2重定義を回避するために,各マクロタスクヘのデータ依存間係が制御によらず一意になるようにマクロタスクを構成する.次に,実行時のプロセッサ数を削減するため,マクロタスク生成においては,データ依存と制御依存の間係を用いて,マクロタスクを融合した場合も,先行評価の効果を失わない部分を1つのマクロタスクとする.これは,従来のマクロタスク生成手法が制御依存のみを考えていたのに対し,データ依存を考えた生成手法として新規性を持つ.

    CiNii J-GLOBAL

  • A flow‐executing scheme for DOACROSS loops on dynamic dataflow machines

    Yoshihiko Ishii, Hayato Yamana, Toshiaki Yasue, Yoichi Muraoka

    Systems and Computers in Japan   24 ( 4 ) 1 - 12  1993

     View Summary

    This paper modifies a flow‐executing scheme of the color‐reuse type, using multiple initial loop control packets, and then proves that the flow‐executing scheme is best suited for executing DOACROSS loops on dynamic dataflow machines. Flow‐executing schemes can be divided into four categories: (1) those using a single initial loop control packet
    (2) those using multiple initial loop packets
    (a) the color overflow type
    and (b) the color reuse type. Then the flow‐executing scheme can be classified into Classes (1‐a), (1b), (2‐a), and (2‐b) through the combination of Categories (1), (2), (a), and (b). This paper suggests that Class (2‐b) is best suited for executing DOACROSS loops, as it extracts full parallelism from DOACROSS loops, no sychronization overhead exists, and no memory access overhead exists after the synchronization. Copyright © 1993 Wiley Periodicals, Inc., A Wiley Company

    DOI

  • The inner expression of HAREDAS: The compiler development environment or multi-architecture compiler for massive parallel computing

    YASUE Toshiaki, KANEKO Masanori, HAGIWARA Junichi, TAHARA Ayumu, YAMANA Hayato, MURAOKA Yoichi

    全国大会講演論文集   45 ( 5 ) 335 - 336  1992.09

     View Summary

    はれだすは超並列化マルチアーキテクチャコンパイラの開発を目的とした開発環境である。本稿では、はれだすの内部表現とその上での先行評価表現方法について述べる。超並列化のための1つのアプローチとして、先行評価により既存言語中に陰に含まれる並列性を抽出する方法がある。先行評価とはプログラム中の制御依存関係を変更することにより、データ依存関係以外の先行制約関係を排除する高速化手法である。しかし、従来の先行評価では、命令レベルスケジューリングにおける並列性不足の補助手段としてしか実現されていない。はれだすでは、内部表現レベルで汎用的に先行評価を扱うことができるため、先行評価により引き出し得る並列性を有効に利用することが可能となる。本稿では、この内部表現による先行評価の表現方法について述べる。まず第2節においてはれだすの構成を述べる。続く第3節で、内部表現の構成と特徴について説明したのち、第4節で先行評価の表現方法とその操作方法について詳説する。

    CiNii J-GLOBAL

  • A Description and its Application of the Data Dependence between Loops for : HAREDAS

    KANEKO Masanori, YASUE Toshiaki, HAGIWARA Junichi, TAHARA Ayumu, YAMANA Hayato, MURAOKA Yoichi

    全国大会講演論文集   45 ( 5 ) 337 - 338  1992.09

     View Summary

    本稿では、マルチアーキテクチャコンパイラ開発環境-はれだす-におけるループ間依存関係の記述方法とその適用例について述べる。従来、プログラム中におけるデータ依存関係を特徴付ける方法して依存ベクトルが用いられている。しかし、この依存ベクトルは、同一ループ内あるいはループ外におけるデータ依存関係を記述するものであり、ループ間のデータ依存関係を特徴付けるための一般的な記述方法は定義されていなかった。これに対して本稿では、ループ間依存ベクトルを定義し、ループ間のデータ依存関係を記述する方法について述べる。また、ループ間依存ベクトルを用いることにより、ループ融合可能判定が従来手法に比べて容易に行えることを示し、さらに、各ループの並列性を失うことなくループを融合するためのループ間依存ベクトルの適用法について述べる。

    CiNii J-GLOBAL

  • A Distributed Control Scheme of the Multi-stage Jumping Execution of Conditional Branches for Macrotasks

    YAMANA Hayato, YASUE Toshiaki, ISHII Yoshihiko, MURAOKA Yoichi

    全国大会講演論文集   45 ( 6 ) 121 - 122  1992.09

     View Summary

    本報告では、先行評価を用いたマクロタスクの多段仮実行方式におけるマクロタスクの効果的な制御手法として、マクロタスクの分散制御手法を提案する。多段仮実行方式は、プログラム中のデータ依存と制御依存の内、データ依存を保証した段階でマクロタスクと呼ぶタスクの実行を開始し、後で制御依存に基づいて制御確定したマクロタスクを選択する手法である。本方式を実際のマルチプロセッサ上で実現するにあたっての問題点は、実行時に発生する各種オーバヘッドの削減である。実行時のオーバヘッドには、制御が確定しない段階で実行を開始することにより発生する(1)メモリバンド幅の増大に起因するオーバヘッド、(2)多数のマクロタスクを制御するために発生する制御オーバヘッドがある。本稿では、これら2つのオーバヘッドの内、(2)のオーバヘッドを削減するための手法として、プロセッサにマクロタスク制御専用のハードウェアを付加し、集中制御を廃したマクロタスクの制御手法を提案する。(1)の問題は、マクロタスクのスケジューリング問題であり、今後の課題である。

    CiNii J-GLOBAL

  • An optimized vectorization scheme for multiply nested loops

    SHINKAI Masashi, YASUE Toshiaki, KANEKO Masanori, YAMANA Hayato, MURAOKA Yoichi

    全国大会講演論文集   44 ( 5 ) 93 - 94  1992.02

     View Summary

    本稿では、多重ループの最適なベクトル化を実現するために、(1)内側ループからのタイト化、(2)積極的なループ分割、という2つの解析方針に基づくベクトル化手法を提案する。従来の多重ループのベクトル化手法では、(1)外側ループからタイト化するためループ分割が十分できない、(2)ループ分割による損得の評価が不完全である、という問題があり、最適なベクトル化ができない。そこで、本稿ではこれらの問題を解決するための解析手法を提案するとともに、実機(富士通のVP220O)において本手法を定量的に評価する。

    CiNii J-GLOBAL

  • A Control Scheme of Multistage Eager Evaluation for Multiprocesser System

    ISHII Yoshihiko, YASUE Toshiaki, YAMANA Hayato, MURAOKA Yoichi

    全国大会講演論文集   44 ( 6 ) 27 - 28  1992.02

     View Summary

    本稿では,マルチプロセツサシステムにおける,タスク一段の先行評価の制御(一段先行評価制御)と,タスク多段に渡る先行評価の制御(多段先行評価制御)との違いについて述べる.我々は,一投先行評価制御,及び,多段先行評価制御を具体的なマルチプロセッサシステム(並列処理システム-晴-)に沿って提案してきた.本稿では,一段先行評価制御,及び,多段先行評価制御を時相論理で表現し一般化する.その後,この時相論理を用いて,制御の違いを推論する,また,この推論によって,我々が提案してきた具体的なマルチプロセッサシステムに沿った一段先行評価制御,及び,多段先行評価制御の正当性を述べる.

    CiNii J-GLOBAL

  • A Parallelizing and Executing Scheme of FORTRAN Programs with Eager Evaluation

    YAMANA Hayato, YASUE Toshiaki, MURAOKA Yoichi

    全国大会講演論文集   44 ( 6 ) 29 - 30  1992.02

     View Summary

    本報告では,FORTRANプログラムをマルチプロセッサ上で高速に実行するための方式として,先行評価を用いたプログラムの並列化手法と実行方式を提案する.従来,条件分岐を含むプログラムを並列化する手法として,タスクの最速実行条件を求める手法や制御依存を越えた実行方式が提案されている.しかし,(1)最速実行条件を求めるだけでは十分な並列性が得られない,(2)対象プログラムが限定され,かつ,実行方式の提案がないといった問題を持つ.これらの問題に対して我々は,フローグラフ展開を用いた仮実行方式,データ駆動を用いた条件分岐のn段先行評価制御方式を提案している.本稿では,これらの手法を一般化すると共に,理論的な速度向上について論じる.

    CiNii J-GLOBAL

  • An Execution Scheme for DO-loops on Distributed Memory Machines.

    萩原純一, 安江俊明, 金子正教, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   92 ( 172(CPSY92 9-19) )  1992

    J-GLOBAL

  • A Scheme to Reduce the Access Rate to Schred Memory for Multiprocessor System.

    山名早人, 村岡洋一

    電子情報通信学会技術研究報告   92 ( 172(CPSY92 9-19) )  1992

    J-GLOBAL

  • 早稲田大学理工学部電子通信学科村岡洋一研究室

    田渕 仁浩, 山名 早人

    人工知能学会誌   6 ( 3 )  1991.05

    CiNii

  • Parallel execution scheme of conditional branches with graph unfolding for the parallel processing system - Harray

    Hayato Yamana, Toshiaki Yasue, Jun Kohdate, Yoichi Muraoka

    Bulletin of Centre for Informatics (Waseda University)   12   8 - 18  1991.03

     View Summary

    The purpose of this paper is to propose and evaluate a new scheme, called the Preceding Activation Scheme with Graph Unfolding, which translates a FORTRAN program into a dataflow graph and executes it efficiency. The problems in restructuring a FORTRAN program into a dataflow graph is that a FORTRAN program has an explicit control flow, which results in little parallelism because many gate-operations, such as T/F gates, are introduced in the dataflow graph to synchronize the data mevement. Thus, discarding these gate-operations is the key to expose parallelism from a FORTRAN program, which is the main purpose of the proposed scheme. In the software simulation, it is shown that the execution speed with the proposed scheme for flow graphs without backward branches is about 1.5 times as fast as that of the pure dataflow computer. Moreover, the execution speed is 2.7 times as fast as that of the pure dataflow computer if a flow graph including backward branches is unfolded by the proposed scheme.

    J-GLOBAL

  • A Parallel Execution Scheme of Conditional Branches and its Evaluation for the Parallel Processing Sustem : Harray

    YAMANA Hayato, YASUE Toshiaki, Kohdate Jun, MURAOKA Yoichi

    全国大会講演論文集   42 ( 6 ) 60 - 61  1991.02

     View Summary

    本報告では,プログラム内の条件分岐を並列処理することによるプログラム実行時間の短縮について述べ,我々の提案している並列処理システム-晴-上での条件分岐並列処理手法の性能予測を示す.プログラム中の条件分岐を並列処理しようという試みは,VLIW型計算機を中心にこれまでに数多く行われている.しかし,これらの方式は,大規模な並列処理計算機を対象とした方式ではないため,条件分岐の先行評価段数が小さく,得られる並列性も小さい.これに対して,-晴-は1000台規模の要素プロセッサを持つため,先行評価段数を大きくし,十分な並列性をプログラムから抽出する.先行評価段数を大きくする手法として,我々はこれまでにフローグラフ展開を提案している.フローグラフ展開とは,条件分岐点における同期をとらず,条件の成立・不成立によって分かれる全ての制御フローについて演算を同時に実行し,後で制御に基づいて有効となったフローを選択する手法である.これまでの評価では,フローグラフ展開の対象となる部分について,1.5倍-5.2倍の処理速度の向上を確認している.本稿では,まず,(1)条件分岐の並列処理による処理速度向上をいくつかの科学技術計算プログラムのシミュレーション結果を用いて示し,次に(2)フローグラフ展開による処理速度の向上が,プログラム全体として考えた時に,どの程度期待できるかについて評価した結果を示す.

    CiNii J-GLOBAL

  • A controll Scheme of Processing Element for the Parallel Processing System : Harray

    ISHIZAKI Kazuaki, ISHII Yoshihiko, HAGIMOTO Takeshi, YAMANA Hayato, MURAOKA Yoichi

    全国大会講演論文集   42 ( 6 ) 62 - 63  1991.02

     View Summary

    本報告では、並列処理システム-晴-における仮実行時の要素プロセッサ(PE)内の制御方法について述べる。仮実行方式とは、プログラムの並列実行を妨げる原因の一つである制御依存を超えてeagar evaluationを行う方式である。従来のパイプラインプロセッサ等のeagar evaluationは、単一プロセッサ内で行われていたためその範囲が小規模であった。-晴-では、仮実行を複数プロセッサを用い、多段にわたってeagar evaluationを行う。ここで問題となるのは分岐が決定した際の、複数PE間にわたるPEの制御方法である。制御を一箇所で集中的に行うと1000台規模のプロセッサではオーバヘッドが無視できない。そこで、我々は実行制御をPE毎に分散して行う方式を提案している。本報告では、まず仮実行の単位としてActivation Setという制御単位を定義する。次に、Activation Setを用いた仮実行時のPE毎に独立した制御方法について、その概要を述べる。さらに、PE内での具体的な処理手順を示す。

    CiNii J-GLOBAL

  • A Control Scheme of Processing Element for the Parallel Processing System. Harray.

    石井吉彦, 石崎一明, 萩本猛, 山名早人, 村岡洋一

    情報処理学会全国大会講演論文集   43rd ( 6 )  1991

    J-GLOBAL

  • Prototype FORTRAN Compiler for Parallel Processing System -Harray-.

    安江俊明, 神舘淳, 山名早人, 村岡洋一

    並列処理シンポジウム論文集   1991  1991

    J-GLOBAL

  • A Parallel Execution Scheme of Conditional Branches Using Eager Evaluation for the Parallel Processing System-Harray.

    山名早人, 石崎一明, 安江俊明, 村岡洋一

    情報処理学会研究報告   91 ( 64(ARC-89) )  1991

    J-GLOBAL

  • A Scheme to Reduce the Access Rate to Shared Memory for the Parallel Processing System -Harray-.

    山名早人, 大段智志, 村岡洋一

    並列処理シンポジウム論文集   1991  1991

    J-GLOBAL

  • Loop Parallelizing Scheme:Dependent-flow Loop.

    金子正教, 中里倫明, 安江俊明, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   91 ( 130(CPSY91 4-33) )  1991

    J-GLOBAL

  • A Construction of Execution Unit for Parallel Processing System. Harray.

    萩本猛, 山名早人, 村岡洋一

    情報処理学会全国大会講演論文集   43rd ( 6 )  1991

    J-GLOBAL

  • A Network Construction of Parallel Processor for Eager Evaluation.

    石崎一明, 安江俊明, 山名早人, 村岡洋一

    情報処理学会研究報告   91 ( 64(ARC-89) )  1991

    J-GLOBAL

  • A Scheduling Scheme for the Parallel Processing System. Harray. A Task Restructuring for the Reduction of Inter-Processor Communication and Synchronization.

    萩原純一, 安江俊明, 山名早人, 村岡洋一

    情報処理学会全国大会講演論文集   43rd ( 6 )  1991

    J-GLOBAL

  • An environment for dataflow program development of parallel processing system‐harray

    Hayato Yatnana, Jun Kohdate, Toshiaki Yasue, Associate Members, Yoichi Muraoka

    Systems and Computers in Japan   22 ( 8 ) 26 - 38  1991

     View Summary

    This paper considers the dataflow program development environment for the system programmer who develops the compiler and proposes a method to improve the debugging efficiency. The conventional debugging methods are either: (1) to monitor the packet in the dataflow ring, or (2) to specify the function containing a bug. The former contains unsolved problems such as the determination of start timing for the data monitoring and the presentation of a large amount of information to the user. The latter contains a problem in that the debugging is impossible at the dataflow level. This paper aims at the solution of those problems, and the detailed debugging is executed on the software, not on the real machine. The information presentation on a dataflow graph is considered for systematic presentation of the debugging information. As the development environment, the parallel processing system Harray proposed by the authors is considered. In the proposed system, a two‐stage process is employed in which the first step is to specify the macro‐block (which is a task unit in Harray) containing the bug, and the second step is the detailed debugging of the specified macro‐block. The debugging within the macroblock is executed on the software, and the debugging efficiency is improved by: (1) diagram representation for easier visual recognition, and (2) backward tracing function. Copyright © 1991 Wiley Periodicals, Inc., A Wiley Company

    DOI

  • Dataflow program developing environment for the parallel processing system -Harray-.

    安江俊明, 神舘淳, 山名早人, 村岡洋一

    並列処理シンポジウム論文集   73 ( 6 ) p569 - 579  1990.06

    CiNii J-GLOBAL

  • Implementation of color management scheme on data driven computer.

    石井吉彦, 安江俊明, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   90 ( 143(CPSY90 12-37) )  1990

    J-GLOBAL

  • A construction of switching unit for parallel processing system -Harray-.

    石崎一明, 山名早人, 村岡洋一

    情報処理学会研究報告   90 ( 90(ARC-85) )  1990

    J-GLOBAL

  • The organization of global memory for the parallel processing system -Harray-.

    山名早人, 片山啓, 草野義博, 村岡洋一

    並列処理シンポジウム論文集   1990  1990

    J-GLOBAL

  • A macro-block controlling scheme of parallel processing system. Harray.

    山名早人, 安江俊明, 神館淳, 村岡洋一

    電子情報通信学会技術研究報告   90 ( 143(CPSY90 12-37) )  1990

    J-GLOBAL

  • Evaluation of color management on parallel processing system -Harray-.

    石井吉彦, 安江俊明, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   90 ( 11(CPSY90 1-4) )  1990

    J-GLOBAL

  • A method of function calls. parallel processing system -Harray-.

    石崎一明, 神舘淳, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   90 ( 11(CPSY90 1-4) )  1990

    J-GLOBAL

  • A FORTRAN compiler for paralell processing system. Harray.

    安江俊明, 神館淳, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   90 ( 143(CPSY90 12-37) )  1990

    J-GLOBAL

  • Parallel processing system -Harray-

    H. Yamana, Y. Kusano, T. Yasue, J. Kohdate, T. Hagiwara, Y. Muraoka

    Computing Systems in Engineering   1 ( 1 ) 111 - 130  1990

     View Summary

    The parallel processing system -Harray- for scientific computations is introduced. The special features of the -Harray- system described are (1) the Controlled Dataflow (CD flow) mechanism, (2) the preceding activation scheme with graph unfolding, and (3) the visual environment for dataflow program development. The CD flow mechanism, controlling the sequence of execution in two levels-dataflow execution in each processor and control flow execution between processors-is adapted in the -Harray- system. Though dataflow computers are expected to extract parallelism fully from a program, they have many problems, such as the difficulty of controlling the sequence of execution. To solve these problems, the CD flow mechanism is adopted. The preceding activation scheme makes it possible to bypass control dependencies in a program, such as IF-GOTO statements which decrease the parallelism in a program. The flow graph of a program is unfolded to decrease the control dependency and to increase the parallelism. The visual environment helps programmers in the writing and debugging of a dataflow program. The environment consists of a graphical editor of a dataflow graph, and a debugger. These special features of the -Harray- system and its execution mechanism are described. © 1990.

    DOI

  • A Method of Color Management for Parallel Processing System -Harray-

    Ishii Yoshihiko, Yasue Toshiaki, Yamana Hayato, Muraoka Yoichi

    全国大会講演論文集   39 ( 3 ) 1798 - 1799  1989.10

     View Summary

    本稿では、並列処理システム-晴-[1]におけるカラー管理方式の一提案を行なう。-晴-では、マクロブロック[1]という処理単位内で動的データ駆動方式を採用している。動的データ駆動方式ではループの処理にカラーを用いる。しかし、カラーは有限であるため、カラーの資源管理が必要となる。カラーの資源管理、即ち、カラーの回収・再割当に関して従来の方式では、「カラーのオーバフロー時に新しいループを生成する方法」が提案されている[2]。しかしながら、ループ生成のオーバヘッドが大きいという問題を持つ。また、計算機資源は有限であるから、計算機資源以上のカラーを用いても、処理速度向上は望めない。即ち、計算機資源に見合ったカラーを使用すれば良い。これらの点をふまえて、本稿では、必要以上のカラーを使用せず、カラーの回収・再割当のオーバヘッドを削減したループ処理方式を提案する。以下では、まず、ループ本体に対しデータフロー解析を行ない、カラーの必要個数(Lで表わす)を求める。そして、カラーのオーバフローを回避し、Lで制限されたカラーを回収・再割当するループ処理方式を示す。なお、今回はLが計算機資源以下の場合について報告する。

    CiNii J-GLOBAL

  • A Run-time Error Handling Scheme for Parallel Processing System -Harray-

    Hagimoto Takeshi, Kusano Yoshihiro, Yamana Hayato, Muraoka Yoichi

    全国大会講演論文集   39 ( 3 ) 1800 - 1801  1989.10

     View Summary

    我々は,科学技術計算用並列処理システム-晴-(-HARRAY-:IIybrid ARRAY)を提案している).-晴-は,科学技術計算用にF0RTRANで記述されたプログラムを高速に実行することを目的とし,要素プロセッサを1024個持つ並列処理システムである.-晴-の実行方式は,プログラムをコンパイル時にマクロブロックという単位に分割し,マクロブロック間をコントロールフロー,マクロブロック内をデータフローで処理を行うCDフロー方式である.データフローのプログラムでは,後述するゲート後置を行うと,計算機資源が無限にあると仮定したとき,実行速度が約3倍向上することを確認している.しかし,計算機資源は有限であるため,-晴-では,プログラムの並列度が計算機資源よりも小さい部分でゲート後置を行い,この部分の実行速度を向上させる.しかし,制御ゲートが実行時エラーを回避させるために設けられているとき,ゲート後置を行うと,その先行評価部分で,ゲート後置が原因の実行時エラーが発生する場合がある.この実行時エラーは,ユーザのプログラムの誤りが原因でないため,ユーザに報告することはできない.したがって,ゲート後置が原因となりえる実行時エラーが発生したとき,その発生原因がゲー置であるのかプログラムの誤りであるのかを判断する必要がある.本稿では,ゲート後置が原因となりえる実行時エラーが発生したとき,その...

    CiNii J-GLOBAL

  • A Structure Handling Scheme of Parallel Processing System -Harray-

    Yamana Hayato, Kusano Yoshihiro, Muraoka Yoichi

    全国大会講演論文集   39 ( 3 ) 1802 - 1803  1989.10

     View Summary

    本稿では,並列処理システム-晴-〔1〕における構造体処理方式〔2〕の実現方法について述べる.-晴-では,実行方式にCDフロー(Controlled Dataflow)方式〔3〕を採用している.CDフロー方式では,マクロブロックと呼ぶ処理単位間でコントロールフロー制御をおこない,マクロブロック内でデータフロー実行をおこなう.データフロー実行には記憶の概念が存在しないが,実際に計算機を構成するにあたっては,大規模な構造体を格納するための構造体記憶が必要不可欠である.従来,構造体処理に関してI-ストラクチャ〔4〕等が提案されている.しかし,これらの方式はデータフロー方式の持つ単一代入則を厳密に実現したものであって,参照は複数回できるが,定義は1回のみという制限を持つ.したがって,二重定義時には,構造体をコピーしなければならず,オーバヘッドが発生する.-晴-では,構造体記憶(以下大域記憶と呼ぶ)に対して複数回の定義及び参照を可能とし,二重定義時のコピーオーバヘッドを無くした構造体処理方式を提案している〔2〕.本方式では,アクセス順序の保証を複数マクロブロックに及ぶ定義・参照に対しておこない,マクロブロック内に閉じた定義・参照は対象としない.これは,マクロブロック内で定義されたデータを同一マクロブロック内で使用する場合には,定義されたデータをらである.本稿では,まず-晴-の大域記憶及び...

    CiNii J-GLOBAL

  • A structure handling scheme of parallel processing system -Harray-.

    山名早人, 草野義博, 村岡洋一

    情報処理学会全国大会講演論文集   39th ( 3 )  1989

    J-GLOBAL

  • Visual environment for lower level program development of parallel processing system -Harray-.

    安江俊明, 神舘淳, 萩原孝, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   89 ( 19(CPSY89 1-5) )  1989

    J-GLOBAL

  • Evaluation of unfolded flow graph for the parallel processing system -Harray-.

    荻原孝, 山名早人, 神館純, 村岡洋一

    情報処理学会研究報告   89 ( 30(ARC-75) )  1989

    J-GLOBAL

  • A controlled dataflow mechanism of parallel processing system - Harray.

    山名早人, 草野義博, 神舘淳, 安江敏明, 村岡洋一

    電子情報通信学会技術研究報告   89 ( 168(CPSY89 45-58) )  1989

    J-GLOBAL

  • Parallel processing system HARE.

    萩原孝, 山名早人, 丸島敏一, 村岡洋一

    BIT (Tokyo)   21 ( 4 )  1989

    J-GLOBAL

  • A compiling algorithm of parallel processing system - Harray with graph unfolding scheme.

    神舘淳, 安江俊明, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   89 ( 168(CPSY89 45-58) )  1989

    J-GLOBAL

  • A PRECEDING ACTIVATION SCHEME WITH GRAPH UNFOLDING FOR THE PARALLEL PROCESSING SYSTEM HARRAY

    H YAMANA, T HAGIWARA, J KOHDATE, Y MURAOKA

    PROCEEDINGS : SUPERCOMPUTING 89     675 - 684  1989

     View Summary

    The purpose of this work is to propose and evaluate the preceding activation scheme with graph unfolding, which translate a Fortran program into a dataflow graph and executes it efficiently. The problems in restructuring a Fortran program into a dataflow graph are that a Fortran program is not written in a single assignment rule and it has an explicit control flow. These problems result in little parallelism because many gate operations, such as T/F gates, are introduced in the dataflow graph to synchronize the data movement. Therefore, discarding these gate operations is the key to exposing parallelism in a Fortran program. The preceding activation scheme with graph unfolding is proposed to discard these gate operations. The result of the performance evaluation by the &#039;Harray&#039; software simulator is presented. It is shown that the execution speed with the proposed scheme for flow graphs without backward branches is about 1.5 times as fast as that with the extended activation scheme which initiates the execution only after it is confirmed that a basic block will be selected at a conditional branch. Moreover, the execution speed is 2.7 times as fast as that with the extended activation scheme if a flow graph including backward branches is unfolded by the proposed scheme.

  • A Construction of Waiting Memory for Parallel Processing Syatem -Harray-

    Yamana Hayato, Kusano Yoshihiro, Hagiwara Takashi, Muraoka Yoichi

    全国大会講演論文集   37 ( 1 ) 65 - 66  1988.09

     View Summary

    我々は、主に科学技術計算を目的とした並列処理システム-晴-を提案している。-晴-では、プログラムに内在する並列性を十分に引き出す為にデータフロー実行を取り入れている。データフロー実行では、ノードの発火制御を司る待ち合わせ記憶(WM:Waiting Memory)の高速化がシステム全体の高速化において重要なポイントとなる。本稿では、-晴-の試作機で用いる待ち合わせ記憶WMの構成について述べると共に、ソフトウェアシミュレータによる簡単な評価を行う。

    CiNii J-GLOBAL

  • The Completion Detection of Macro-Block in Parallel Computer System -Harray-

    Kusano Yoshihiro, Hagiwara Takashi, Yamana Hayato, Muraoka Yoichi

    全国大会講演論文集   37 ( 1 ) 67 - 68  1988.09

     View Summary

    我々が提案している科学技術計算処理用データフロー・マルチプロセッサシステム-晴-では、各プロセッサエレメントへ割り当てるタスクの分割にマクロブロックという概念を用いている。マクロブロックとはプログラムをある基準に従って図1のように分割したもので、-晴-ではマクロブロックを単位としてプロセッサエレメントにタスクを割り当てる。マクロブロック内部ではデータ駆動制御で計算を進めて自然に並列性を抽出し、さらにマクロブロック間にコントロールフロー制御を導入し階層的な制御構造をとる。このような方法により、制御命令の増加などのデータ駆動制御の欠点を補うことができる。しかし、マクロブロックを単位としてタスクを割当てる際に種々の問題が生じる。マクロブロックの終了検出を高速に行なう必要があることもその一つである。そこで、本稿ではマクロブロックの終了検出を高速に行なう手法について述べ、簡単な評価を行なう。

    CiNii J-GLOBAL

  • A construction of processing element in a parallel processing system -Harray-.

    山名早人, 丸島敏一, 草野義博, 村岡洋一

    情報処理学会研究報告   88 ( 19(CA-70) )  1988

    J-GLOBAL

  • Evaluation of processing element in parallel computer system-Harray.

    草野義博, 山名早人, 丸島敏一, 村岡洋一

    情報処理学会全国大会講演論文集   36th ( 1 )  1988

    J-GLOBAL

  • Execution mechanism of parallel processing system -Harray-.

    丸島敏一, 山名早人, 萩原孝, 草野義博, 村岡洋一

    情報処理学会研究報告   88 ( 4(CA-69/MC-48) )  1988

    J-GLOBAL

  • Evaluation of a simulated parallel processing system - Harray - .

    山名早人, 萩原孝, 草野義博, 村岡洋一

    情報処理学会研究報告   88 ( 79(ARC-73) )  1988

    J-GLOBAL

  • A design concept of the compiler for parallel processing system-Harray-.

    萩原孝, 山名早人, 村岡洋一

    電子情報通信学会技術研究報告   88 ( 155 )  1988

    J-GLOBAL

  • EXPERIENCE USING THE RESTRUCTURING COMPILER PARAFRASE.

    Toshikazu Marushima, Takashi Hagiwara, Hayato Yamana, Yoichi Muraoka

    Bulletin of Centre for Informatics (Waseda University)   5   69 - 77  1987.03

     View Summary

    Parallel processing with an ordinary sequential language is important from a point of a view of its simplicity and the effective utilization of existing software. This paper reports an experience gained by using Parafrase, a restructuring compiler developed by University of Illinois.

    J-GLOBAL

  • A construction of processing element in parallel scientific computer system -Harray-.

    山名早人, 丸島敏一, 萩原孝, 村岡洋一

    情報処理学会全国大会講演論文集   34th ( 1 )  1987

    J-GLOBAL

  • A scheme of macro blocking for parallel scientific computer system -Harray-.

    萩原孝, 山名早人, 丸島敏一, 村岡洋一

    情報処理学会全国大会講演論文集   34th ( 1 )  1987

    J-GLOBAL

  • RetweetReputation: バイアスを排除したTwitter投稿内容評価手法

    藤木紫乃, 矢野博也, 山名早人

    DEIM2011   A10-3

  • Sequential Pattern Mining with Time Interval

    Yu Hirate, Hayato Yamana

    Proc. of PAKDD2006  

    DOI

▼display all

Industrial Property Rights

  • 認証システム、認証プログラム及び認証方法

    山名 早人, 工藤 雅士

    Patent

    J-GLOBAL

  • 略語生成システム

    特許第6135867号

    石川 開, 土田 正明, 大西 貴士, 山名 早人, 及川 孝徳

    Patent

    J-GLOBAL

  • 記憶度推定装置および記憶度推定プログラム

    特許第6032638号

    山名 早人, 苑田 翔吾, 浅井 洋樹

    Patent

    J-GLOBAL

  • 辞書作成支援装置、辞書作成支援方法及び辞書作成支援プログラム

    特許第5648890号

    立石 健二, 細見 格, 山名 早人

    Patent

    J-GLOBAL

  • 辞書作成支援装置、辞書作成支援方法及び辞書作成支援プログラム

    立石 健二, 細見 格, 山名 早人

    Patent

    J-GLOBAL

  • ネットワーク取引不正行為者検出方法

    山名 早人, 平出 勇宇, 相吉澤 明, 木戸 冬子

    Patent

    J-GLOBAL

▼display all

Works

  • CREST SECURE DATA SHARING AND DISTRIBUTION PLATFORM FOR INTEGRATED BIG DATA UTILIZATION

    Software 

    2015.10
    -
    2021.09

  • 多メディアWeb解析基盤の構築及び社会分析ソフトウェアの開発

    2008
    -
     

  • 検索エンジンの信頼性

    2007
    -
     

  • Trustwothiness of Search Engines

    2007
    -
     

  • e-Society/インターネット上の知識集約を可能にするプラットフォーム構築技術

    2002
    -
    2007

  • e-Society Project

    2002
    -
    2007

  • Productive ICT Academia Program(21世紀COE)

    2002
    -
    2006

  • 効率的な情報収集に関する調査(Web)

    2002
     
     

  • 分散ソフトウェアロボット負荷分散法の研究

    2000
    -
    2002

  • アドバンスト並列化コンパイラ技術の開発(NEDO/METI)

    2000
    -
    2002

  • Research on Load Balancing Technique for Distributed WWW Robots

    2000
    -
    2002

  • Research on Advanced Parallelizing Compiler

    2000
    -
    2002

  • WWW情報検索システムのサーベイ

    2000
    -
     

  • Survey in WWW Search Engines

    2000
    -
     

▼display all

Awards

  • Fellow

    2020   IPSJ  

  • Golden Core Award

    2018   IEEE Computer Society  

  • Fellow

    2018   IEICE  

  • Excellent Paper Award

    2013   IEICE  

  • Excellent Paper Award, DBSJ (JAPAN)

    2009  

  • IBM Faculty Award

    2009  

  • Best Author Award, ITE(Japan)

    2003  

  • Best Author Award, IPSJ (JAPAN)

    2002  

  • 山下記念研究賞(情報処理学会)

    1995  

  • 研究奨励賞(情報処理学会)

    1993  

▼display all

Research Projects

  • Identification of logical thinking ability from online handwritten data

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2020.04
    -
    2024.03
     

  • Credibility Analysis of Web contents based on 10 billion Web pages

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2017.07
    -
    2022.03
     

  • 戦略的創造研究推進事業(CREST)「ビッグデータ統合利活用のための次世代基盤技術の創出・体系化」領域

    JST  戦略的創造研究推進事業(CREST)

    Project Year :

    2015.10
    -
    2021.09
     

  • Authorship Identification for Hundred-thousand-scale Microblog Users in the Web

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2013.04
    -
    2017.03
     

    YAMANA Hayato, OYAMA Keizo, UNO Takeaki, OKUNO Syunya, OKUTANI Takashi, ASAI Hiroki, UESATO Kazuya, TANAKA Masahiro, SHINOHARA Shota, ISHIYAMA Takehiro, Wang Lan

     View Summary

    Since various information floods on the Internet, its credibility is becoming a social problem. In this study, we researched on authorship identification technique for short messages such as SNS, targeting to identify the authorship of the messages from among 100,000 candidates. That is, if there is some documents written in advance by the author, it is possible to estimate the writer. As a result, we have established a mechanism to find a specific user out of 100,000 SNS users with accuracy of 60% if we have only 30 messages. In addition, the probability of being able to extract in the top 10 places was 74%. This is a major contribution to the fact that other research in the world is limited to about 20% accuracy for 100,000 candidates.

  • Frustration Detection using Online Handwriting Behavior

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Project Year :

    2014.04
    -
    2016.03
     

    YAMANA HAYATO, ASAI Hiroki

     View Summary

    In the era of educational computerization, this research aimed to extract students’ frustration during their study by using online handwritten data, followed by promoting effective personalized study which will be realized in near future.
    <BR>
    We classified student’s frustration into two categories: 1) frustration caused from non-established memory and 2) frustration during their answering process. As for 1), we defined “memory” into non-established, subjective established, and subjective non-established memory. Then, our proposed system, targeting Japanese Kanji memorization, tried to detect non-established memory automatically from subjective-established memory where students thought they memorized but not memorized in fact. Our evaluation shows 0.69 F-value which is applicable to the real world. As for 2), we picked up mathematical problems. The result to categorize their processes shows 0.5 to 0.7 F-value, which will be applicable to the real world.

  • Integrated analysis of web information structure and users' behavior and its application to advanced information access

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2010.04
    -
    2013.03
     

    OYAMA Keizo, AIZAWA Akiko, MIYAO Yusuke, SUN Yuan, KOBAYASHI Tetsuro, HAN Hao, KISHIDA Kazuaki, YAMANA Hayato, OKUMURA Manabu, YOSHIOKA Masaharu, ISHITA Emi, MURATA Tsuyoshi, EGUCHI Koji

     View Summary

    For understanding Web structure and users' behavior of information retrieval and browsing in an integrated way, and for extending it to various applications, we collected and introduced various data reflecting Web information structure and Web users' behavior (e.g. Web view log data, micro-blog data), obtained user data through questionnaire, and executed integrated analysis on them.
    Consequently, we obtained various findings through data such that there is a gap between information wanted to know and information wanted to inform, and that, through using Web portal sites, unexpected contact to various information occurs. Moreover, we proposed and studied various methods for advanced information access such as information recommendation and information retrieval based on the information obtained through the integrated analysis.

  • Challenges and Successful Approaches in Multimedia Event Detection

    MEXT 

    Project Year :

    2009
    -
    2011.03
     

    Shinichi SATO, Masaru KITSUREGAWA, Satoshi TOYODA

  • Analyisis of Search Engines' Trustworthiness

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2009
    -
    2011
     

    YAMANA Hayato, MATSUYAMA Yasuo

     View Summary

    Nowadays, search engines become indispensable for us to live a life ; however, trustworthiness of search engines are unclear. Especially, the number of search results, i. e., hit-count, usually varies about 100 to 1000 times increase or decrease even if we put them the same query word. In this research, we have made clear the transition characteristics of hit-counts based on 15 months investigation for Google, Yahoo! JAPAN and Bing. Moreover, we have proposed a new method to choose trustworthy hit-counts, which results in 99.5% precision when we compare two hit-counts on the point which query word has larger number of search results.

  • メニーコアCPUにおける冬眠コアのゼロ化

    日本学術振興会  科学研究費助成事業 挑戦的萌芽研究

    Project Year :

    2009
    -
    2010
     

    山名 早人

     View Summary

    2010年度は、2009年度に開発したシステム自動最適化アルゴリズムの実機評価を目指した。本アルゴリズムはProducer-Consumer型のモジュール群で構築されたアプリケーションにおいて、メニーコアCPUを最大限に利用できるよう各モジュールに割り当てる計算機やスレッド数を自動で決定し、アプリケーションの性能を最適化することが目標である。研究には我々が開発している分散処理フレームワークであるQueueLinkerを用いた。
    2010年度は、まず、自動最適化アルゴリズムの評価用アプリケーションとしてWebクローラを開発し、QueueLinkerのプロトタイプにより動作を確認した。本クローラを構成するモジュールは全てProducer-Consumer型であり、QueueLinkerにより分散実行できる。実験に先立ち、本クローラがWebサーバにかける負荷を軽減するために、同一Webサーバに対するアクセス時間間隔の最小値を厳密に保証するクローリングスケジューラを開発した。本スケジューラは、時間計算量が0(1)であり、空間計算量の上限がクローリング対象のURL数に依存しない。本アルゴリズムはDEIM 2011において発表した。
    そして、開発したWebクローラをアプリケーションに用い、QueueLinkerの自動プロファイリング機能を開発した。本プロファイリング機能は、モジュールが使用するCPU時間や、ネットワーク通信量をプロファイリングできる。その後、昨年度開発したシステム自動最適化アルゴリズムを実際のプロファイリングデータを利用して動作するよう設計を修正した。本アルゴリズムは、各モジュールが使用するリソース量に基づいて、アプリケーションの性能が最大になるように、モジュールに割り当てる計算機やスレッド数を自動で決定するものである。

  • Highly Scalable Monitoring Architecture for Information Explosion Environments

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

    Project Year :

    2006
    -
    2010
     

    NAKAJIMA Tatsuo, MURAOKA Yoichi, GOTO Shigeki, YAMANA Hayato, KATTO Jiro, OIKAWA Shuichi, AKIOKA Sayaka

     View Summary

    In this project, a monitoring system architecture consists of a set of software to protect information infrastructures, social infrastructures and human everyday life. The goal of the project is to integrate research areas that are independently discussed before.The project developed several monitoring systems for computer systems, network systems and the real world to investigate the future information infrastructure.

  • Design and Development of Advanced IT Research Platform for Information Explosion Era

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

    Project Year :

    2006
    -
    2010
     

    ADACHI Jun, TANAKA Katsumi, NISHIDA Toyoaki, KUNIYOSHI Yasuo, SUDOH Osamu, KUROHASHI Sadao, HARA Takahiro, MATSUOKA Satoshi, TAURA Kenjiro, TATEBE Osami, MUNETOMO Masaharu, HIROTSU Toshio, MATSUBARA Jin, SHIMOJYO Shinji, CHIBA Shigeru, YUASA Taichi, MATSUYAMA Takashi, CHIKAYAMA Takashi, KONDO Toru, KONO Kenji, OKAMOTO Masahiro, AIDA Kento, KAMADA Tomio, KITSUREGAWA Mararu, YAMANA Hayato, NAKAMURA Yutaka, KOBAYASHI Hiroaki, NAKAJIMA Hiroshi

     View Summary

    This project implemented a common research infrastructure for all the research groups participating in this priority-area research initiative, accordingly supported all research activities in this initiative. Providing this infrastructure, we succeeded in accelerating shared utilization of research facilities and resources within the limitation of research funding and strengthening the collaboration among research groups. These shared facilities include (a)TSUBAKI: a open search engine for large-scale corpus, (b)InTrigger : Widely-distributed computing test-bed, (c)IMADE : an environment for real-world interaction measurement and analysis, and (d) prototyping for sensor-network based preventive medicine.

  • Bioinformatics in silico by the Unification of Symobols and Patterns

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2005
    -
    2007
     

    MATSUYAMA Yasuo, YANAGISAWA Masao, YAMANA Hayato, KURUMIZAKA Hitoshi, INOUE Masato

     View Summary

    This project was started towards the development of computational intelligence algorithms for finding soft patterns existing in DNA and amino acid sequences. The main methodology is in Aim. Wet biologists are included in this group so that overly abstract problems are suppressed. The unification between compute-based information scientists and test-tube-based life scientists still requires time, however, a steady step towards such collaboration was enhanced by this project with the following results :
    (1) Prediction methods fir the transcription start site were established. On human .genome which is a representative of eukaryotes, a combination of the spectrum kernel, hidden Markov models, and FFT integrated by a support vector machine was presented. This mechanism yielded a top class ROC curves. On the prediction of E.coli which is a representative of prokaryotes, a combination of the independent component analysis and a support vector machine revealed the best prediction performance to date.
    (2) Anew effective algorithm on the multiple sequence alignment was developed. This new method suppresses the appearance of multiple gaps in the same column. The gap extension can be regulated by piecewise linear penalties. The total algorithm is realized as the software named PRIME. The PRIME showed better performances than ClustalW and T-Coffee in the sense of resulting alignments and computational speed.
    (3) The wet biology team hind an evidence on Rad5l which repairs cut double strands of DNA. The binding site of Rad51 is altered in breast cancer patients.
    As was explained above, this research brought about fruitful results on post genome topics : The prediction of promoters and transcription start sites, a new multiple sequence alignment method leading to tertiary structure prediction, and a cancer property caused by protein functions.

  • Research on Fast Execution using Helper Threads on Multi-threading Processors

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2005
    -
    2006
     

    YAMANA Hayato, SAITO Fumiko

     View Summary

    Recently, many multi-core processors are on markets. In this research, we have studied how to accelerate programs on these multi-core processors by using multi-threading technique.
    In 2005FY, we surveyed this research area, and studied both on the algorithms and on its applications. As for the study on algorithms, we studied how to decrease the number of open slots of CPU pipelines resulted from branch miss-predictions. Moreover, we studied how to control L2 cache to speedup programs because many multi-core CPUs have shared a L2 cache and how to control L2 cache is the key to speedup programs. By proposing the technique that puts data on the best suited area on the cache to decrease wiring delay, we have confirmed that the technique is able to speedup SPECint95/SPECint200 programs 1.17 times of IPC in average. As for the study on applications, we investigated many applications including search related ones that will be suited for multi-core CPUs.
    In 2006FY, the targets of this research were fixed on how to speedup disk accesses based on 2005FY studies. We researched on three areas : 1) pre-loading of data from disks, 2) extending disk cache onto other machine's memory and 3) shell script acceleration by parallelizing disk access. First, we have proposed a new pre-loading scheme from disks by using helper threads. Experimental evaluation using "gzip" application shows that we can archive 39.2% speedup in comparison with using no-helper threads. Second, we have proposed a new disk caching scheme by using remote memory of another PC. The scheme realize extending the size of disk cache by using helper-threads. It has been confirmed that the speedup ratio becomes up to 3.08 on the benchmark program called DBT-3. Third, we have proposed a new parallelizing scheme by using alternative threads for shell script programs. By using our scheme, shell script programs archive 1.4 to 1.8 times speedup in comparison with normal execution. The last result is now under productization by USP Lab (http://www.usp-lab.com/).

  • 広域分散型情報収集・検索システムにおける負荷分散方式の研究

    日本学術振興会  科学研究費助成事業 若手研究(B)

    Project Year :

    2001
    -
    2002
     

    山名 早人

     View Summary

    平成14年度の研究においては、平成13年度の成果を踏まえた上で、ネットワークの混雑状況を考慮した分散収集の仕組みを提案するため、当該WWWサーバに至る経路が複数ある場合の経路の選択手法について研究を実施した。
    具体的には、パケットのトランスポート層の各種情報を分析する事で、複数のネットワーク経路が存在する場合に最適な経路を発見することが可能かどうかを検証した。まず、パケット内のトランスポート層の情報であるTCPヘッダの内容を分析し、複数のネットワークの中から、どのネットワークを使えば効率的にデータ転送を行う事ができるのか示すことができるパラメータを発見することを目指した。
    最初に、転送率とTCPの様々なパラメータ(平均ウィンドウサイズ、最大ウィンドウサイズ、RTT)の関連性について解析した。解析の結果、1KB以上の転送量を持つコネクションよりは1KB未満の転送量を持つコネクションのほうが、ウィンドウサイズと転送率の関係を得やすいということがわかった。さらに、長い転送時間のコネクション(実験では1秒以上)よりは短い転送時間(同1秒未満)のコネクションからの方が、ウィンドウサイズと転送率の関係を得やすいことがわかった。
    これらの結果は、小さい転送量、もしくは短い転送時間のコネクションでは、安定してパケットの送信が行われているためだと考えられる。大きい転送量、もしくは長い転送時間のコネクションは、送信の途中で何らかの問題点を持っている可能性があるため、最適経路を選択する上でのパラメータとしては用いない方がよいことが分かった。
    以上の結果を踏まえ、Webページ収集時に当該WWWサーバまで複数の経路が存在する場合に、経路を選択するための一手法を提案した。
    さらに、昨年度からの継続として、Webページの更新間隔をWebページを収集することなく発見するためのアルゴリズム開発を行った。

  • 科学技術計算用並列処理システム-晴-のアーキテクチャに関する研究

    日本学術振興会  科学研究費助成事業 奨励研究(A)

    Project Year :

    1991
     
     
     

    山名 早人

▼display all

Presentations

  • 特定分野における単語重要度計算手法の提案と短い文章における著者の専門性推定への適応

    滝川真弘, 山名早人

    情報処理学会研究報告(Web) 

    Presentation date: 2017.10

    Event date:
    2017.10
     
     
  • CTR向上を目的としたWEBページ上でのオンライン広告配置位置推定

    大谷一善, 滝川真弘, 堀田弘明, 山名早人

    情報科学技術フォーラム講演論文集 

    Presentation date: 2017.09

    Event date:
    2017.09
     
     
  • FCMalloc:完全準同型暗号の高速化に向たメモリアロケータ

    馬屋原昂, 佐藤宏樹, 石巻優, 今林広樹, 山名早人

    情報処理学会研究報告(Web) 

    Presentation date: 2017.07

    Event date:
    2017.07
     
     
  • 電子ペンを用いた手書き解答データによる幾何学解答パターン分類手法

    森山優姫菜, 下岡純也, 浅井洋樹, 山名早人, 山名早人

    情報科学技術フォーラム講演論文集 

    Presentation date: 2016.08

    Event date:
    2016.08
     
     
  • 特定分野を対象とした単語重要度計算手法の提案とTwitterにおける専門性推定への適応

    滝川真弘, 山名早人

    情報科学技術フォーラム講演論文集 

    Presentation date: 2016.08

    Event date:
    2016.08
     
     
  • 完全準同型暗号のデータマイニングへの利用に関する研究動向

    佐藤宏樹, 馬屋原昂, 石巻優, 今林広樹, 山名早人

    情報科学技術フォーラム講演論文集 

    Presentation date: 2016.08

    Event date:
    2016.08
     
     
  • 完全準同型暗号を用いた高速なゲノム秘匿検索

    石巻 優, 清水 佳奈, 縫田 光司, 山名 早人

    2016年暗号と情報セキュリティシンポジウム(SCIS2016)予稿集 

    Presentation date: 2016.01

    Event date:
    2016.01
     
     
  • A study of effective visit history utilization for Location recommendation-User's Familiarity with area and Visit pattern change-

    HAN JUNGKYU, YAMANA HAYATO

    電子情報通信学会技術研究報告 

    Presentation date: 2015.06

    Event date:
    2015.06
     
     
  • Comparison of Different Semantic Negative Concepts Selection Methods in SVM Classifier Training for Image Annotation

    Shan-Bin Chan, Shin'ichi Satoh, Hayato Yamana

    第5回データ工学と情報マネジメントに関するフォーラム (DEIM2013) 

    Presentation date: 2015.03

    Event date:
    2015.03
     
     
  • Improved Native Language Identification with Upper Phrase Information and Training Data Selection

    TANAKA MASAHIRO, WANG LAN, YAMANA HAYATO

    電子情報通信学会技術研究報告 

    Presentation date: 2014.12

    Event date:
    2014.12
     
     
  • オンライン手書き情報を用いた未定着記憶推定システム

    ASAI HIROKI, YAMANA HAYATO

    情報処理学会研究報告(Web) 

    Presentation date: 2014.11

    Event date:
    2014.11
     
     
  • メンション情報を利用したTwitterユーザプロフィール推定における単語重要度算出手法の考察

    UESATO KAZUYA, TANAKA MASAHIRO, ASAI HIROKI, YAMANA HAYATO

    電子情報通信学会技術研究報告 

    Presentation date: 2014.07

    Event date:
    2014.07
     
     
  • 単語の意味概念行列を用いたキーワード生成による関連論文検索システム

    HAYASHI YUUMA, OKUNO SHUN'YA, YAMANA HAYATO

    電子情報通信学会技術研究報告 

    Presentation date: 2014.07

    Event date:
    2014.07
     
     
  • マイクロブログを対象とした著者推定手法の提案―10,000人レベルでの著者推定―

    OKUNO SHUN'YA, ASAI HIROKI, YAMANA HAYATO

    電子情報通信学会技術研究報告 

    Presentation date: 2014.07

    Event date:
    2014.07
     
     
  • 文体及びツイート付随情報を用いた乗っ取りツイート検出

    上里和也, 奥谷貴志, 浅井洋樹, 奥野峻弥, 田中正浩, 山名早人

    研究報告データベースシステム(DBS)  一般社団法人情報処理学会

    Presentation date: 2013.11

    Event date:
    2013.11
     
     

     View Summary

    Twitter のユーザ数が増加を続ける一方で,不正に ID 及びパスワードを入手され,他人によってツイートを投稿される被害が増加している.これに対し,我々はアカウント乗っ取りによって投稿されるメッセージの一部であるスパムツイートの検出手法を提案し,8 割程度の正答率を得ている.同手法では特定の単語が含まれているスパムツイートを検出対象とし,検出の有効性を示している.本研究では同検出対象を広げ,アカウントの所持者以外が投稿したツイート全体を 「乗っ取りツイート」 として定義し,これを検出する手法を提案する.また本研究では,以前提案した手法に対してパラメータの再調整を行うと同時に,頻繁に用いるハッシュタグの種類及びリプライを送る相手が各アカウントにおいて特徴的であることを利用し,F 値の向上を図った.100 アカウントに対して評価実験を行った結果,我々が提案している従来手法と比較し,F 値を 0.1984 向上させ F 値 0.8570 を達成した.

  • ソーシャルメディアを含む多メディアビッグデータの統合的解析による情報抽出

    上田高徳, 浅井洋樹, 藤木紫乃, 山本祐輔, 武井宏将, 秋岡明香, 山名早人

    情報処理学会研究報告. データベース・システム研究会報告  一般社団法人情報処理学会

    Presentation date: 2012.12

    Event date:
    2012.12
     
     

     View Summary

    本稿では我々が取り組んでいる多メディアビッグデータの統合的解析による情報抽出の試みについて述べる.ソーシャルメディアの普及によって,様々な情報がリアルタイムにインターネット上にアップロードされるようになった.我々は,単一のソーシャルメディアだけでなく,複数の情報源を組み合わせた, 「多メディアデータ」 を解析することで,より有益な情報を抽出できると考えている.本稿では我々が取り組んでいる多メディア解析について述べる.また,大規模リアルタイムデータの解析をサポートするために開発している,並列分散処理フレームワーク QueueLinker についても述べる.

  • ソーシャルメディアを含む多メディアビッグデータの統合的解析による情報抽出

    上田高徳, 浅井洋樹, 藤木紫乃, 山本祐輔, 武井宏将, 武井宏将, 秋岡明香, 山名早人, 山名早人

    電子情報通信学会技術研究報告 

    Presentation date: 2012.12

    Event date:
    2012.12
     
     
  • ソーシャルメディアを含む多メディアビッグデータの統合的解析による情報抽出(ソーシャルメディア,ビッグデータとソーシャルコンピューティング,及び一般)

    上田 高徳, 浅井 洋樹, 藤木 紫乃, 山本 祐輔, 武井 宏将, 秋岡 明香, 山名 早人

    電子情報通信学会技術研究報告. DE, データ工学  一般社団法人電子情報通信学会

    Presentation date: 2012.12

    Event date:
    2012.12
     
     
  • Producer‐Consumer型モジュールで構成された並列分散Webクローラの開発

    上田高徳, 佐藤亘, 鈴木大地, 打田研二, 森本浩介, 秋岡明香, 山名早人, 山名早人

    情報処理学会シンポジウムシリーズ(CD-ROM) 

    Presentation date: 2012.11

    Event date:
    2012.11
     
     
  • 形態素間の優先関係を考慮した略語生成手法

    田中友樹, 及川孝徳, 山名早人, 山名早人, 大西貴士, 土田正明, 石川開

    情報処理学会シンポジウムシリーズ(CD-ROM) 

    Presentation date: 2012.11

    Event date:
    2012.11
     
     
  • 筆記情報と時系列モデルを用いた学習者つまずき検出

    浅井洋樹, 浅井洋樹, 野澤明里, 苑田翔吾, 山名早人

    電子情報通信学会技術研究報告 

    Presentation date: 2012.10

    Event date:
    2012.10
     
     
  • 筆記情報と時系列モデルを用いた学習者つまずき検出(教育・学習支援プラットフォーム/一般)

    浅井 洋樹, 野輝 明里, 苑田 翔吾, 山名 早人

    電子情報通信学会技術研究報告. ET, 教育工学  一般社団法人電子情報通信学会

    Presentation date: 2012.10

    Event date:
    2012.10
     
     

     View Summary

    生徒の学習を支援する際に必要なプロセスとして,つまずきの検知が挙げられる.CAIのつまずき検出に関する研究では,採点結果や解答所要時間,センサーから取得した学習者の顔画像や脈拍などの生体情報,そして入力デバイスであるキーボードやマウスの操作履歴を利用して検知を行う研究が行われてきた.しかし現状の初等教育では筆記活動を中心とした環境であり,こうした環境におけるつまずき検出に関しては深い議論が行われてこなかった.本報告では生徒が利用するペンから得られる筆記情報を元に,つまずきを検出する手法について検討を行う.検出には時系列モデルであるARモデルを用いて学習者の手書き行動が変化する変化点を検出し,変化点間ごとに推定を行う.実施した試験評価において一定の検出性能が確認できた.

  • The 2010 IEEE International Workshop on Quantitative Evaluation of large-scale Systems and Technologies (QuEST): Welcome message from workshop organizers

    Kin Fun Li, Rick McGeer, Stephen Neville, Hayato Yamana

    24th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2010 

    Presentation date: 2010.07

    Event date:
    2010.07
     
     
  • The 2010 IEEE International Symposium on Mining and Web (MAW): Welcome message from symposium organizers

    Takahiro Hara, Kin Fun Li, Shengrui Wang, Hayato Yamana, Laurence T. Yang, Yanchun Zhang

    24th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2010 

    Presentation date: 2010.07

    Event date:
    2010.07
     
     
  • Two step adjustment technique of term weight

    YANO Hiroya, NAKAJIMA Tai, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2010.06

    Event date:
    2010.06
     
     

     View Summary

    TF・IDF method is one of the methods to weight terms in the field of document retrieval. IDF value shows the degree of how a term is difficult to appear in the document set, and depends on the document set to be retrieved. Therefore, the problem is that, even if a term is difficult to appear in the same field of document set as query(which means the term is highly specific in the document), IDF value of term which appears easily in the document set to be retrieved is small. In this paper, we propose and study two step adjustment technique of term weight. In the first step, we get documents r...

  • Similar object detection using template matching focused on positional relationship of feature regions

    Keisuke Arai, Kosuke Morimoto, Hayato Yamana

    IPSJ SIG Notes. CVIM  Information Processing Society of Japan (IPSJ)

    Presentation date: 2010.05

    Event date:
    2010.05
     
     

     View Summary

    The similar object detection from a large quantity of images helps us to be able to organize images by category and research market by using images on the Web. Template matching that can detect similarity object doesn&#039;t suit unknown images so that there is an assumption that target image contains same object. In this paper, we are aimed at decreasing false-positive rate due to the premise of template matching. We propose the method that considers the positional relationships of the feature regions with conventional template matching. Each feature region in template image matches target imag...

  • 6K-7 Data Mining Algorithms Classified Based on Data Access Patterns

    Akioka Sayaka, Muraoka Yoichi, Yamana Hayato, Nakajima Tatsuo

    全国大会講演論文集  Information Processing Society of Japan (IPSJ)

    Presentation date: 2010.03

    Event date:
    2010.03
     
     
  • Model-Based Gaze Tracking with Low-cost Web Cameras

    FUKUDA Takashi, MATSUZAKI Katsuhiko, YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2010.03

    Event date:
    2010.03
     
     

     View Summary

    The gaze estimation without restraining users at home will be contributed to reformation of user interfaces. The commercial gaze estimation systems show high accuracy by using infrared, however, gaze estimation systems with web cameras are highly desired at homes because of their price. The problem using web cams is its low resolution for gaze estimation. Low resolution images are strongly influenced by noises. So the past studies do not estimate detailed direction of eyes. In this paper, we propose a new gaze estimation method which shows high accuracy using image processing and geometrica...

  • Model-Based Gaze Tracking with Low-cost Web Cameras

    FUKUDA Takashi, MATSUZAKI Katsuhiko, YAMANA Hayato

    Technical report of IEICE. HIP  一般社団法人電子情報通信学会

    Presentation date: 2010.03

    Event date:
    2010.03
     
     

     View Summary

    The gaze estimation without restraining users at home will be contributed to reformation of user interfaces. The commercial gaze estimation systems show high accuracy by using infrared, however, gaze estimation systems with web cameras are highly desired at homes because of their price. The problem using web cams is its low resolution for gaze estimation. Low resolution images are strongly influenced by noises. So the past studies do not estimate detailed direction of eyes. In this paper, we propose a new gaze estimation method which shows high accuracy using image processing and geometrica...

  • Cross-media impact on Twitter in Japan

    Sayaka Akioka, Norikazu Kato, Yoichi Muraoka, Hayato Yamana

    International Conference on Information and Knowledge Management, Proceedings 

    Presentation date: 2010

    Event date:
    2010
     
     

     View Summary

    Twitter, a microblogging service, is now grabbing attention of people as a new channel. For deep understanding of this new service, this paper reports the characteristics of Twitter users in Japan, and the impact of media such as publications, and TV programs on Twitter community. To the best of our knowledge, this paper is the first to analyze mutual impact between Twitter, and other media quantitatively. In order for the analyses, we crawled user profiles whose language setting is Japanese, and conducted several analysis with well-known methodologies as conventional work did. We confirmed the characteristics of the collected user profiles. We observed the distributions of the number of friends, and the number of follows both follow power-law, and there exists the correlation between the number of friends, and the number of follows. Besides the collected user profiles, we also utilized closed caption data of TV programs in Japan, and other information on media picked up Twitter. We run a batch of matching these data outside Twitter with the collected user profiles, and concluded Twitter has been already widely spread among Japanese people, however, media have still huge impact on the growth of Twitter users. We also conjectured the impact is not one-sided, however, is mutual influence between Twitter, and other media. © 2010 ACM.

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi AIKAWA, Tetsuya SAKAI, Hayato YAMANA

    WebDBForum2011 

    Presentation date: 2010

    Event date:
    2010
     
     
  • The Method of Improving the Specific Language Focused Crawler,

    Shan-Bin Chan, Hayato Yamana

    Proc. of the 1st CIPS-SIGHAN Joint Conf. on Chinese Language Processing(CLP2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Data Access Pattern Analysis on Stream Mining Algorithms for Cloud Computation,

    Sayaka Akioka, Hayato Yamana, Yoichi Muraoka

    Proc. of the 2010 Int'll Conf. on Parallel and Distributed Processing Techniques and Applications 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Reliability Verification of Search Engines' Hit Counts: How to Select a Reliable Hit Count for a Query

    Takuya Funahashi, Hayato Yamana

    CURRENT TRENDS IN WEB ENGINEERING  SPRINGER-VERLAG BERLIN

    Presentation date: 2010

    Event date:
    2010
     
     

     View Summary

    In this paper, we investigate the trustworthiness of search engines' hit counts, numbers returned as search result counts. Since many studies adopt search engines' hit counts to estimate the popularity of input queries, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they change, when a user clicks the "Search" button more than once or clicks the "Next" button on the search results page, or when a user queries the same term on separate days. In this paper, we analyze the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. The results of our study show that the hit counts with the largest search offset just before search engines adjust their hit counts are the most reliable. Moreover, hit counts are the most reliable when they are consistent over approximately a week.

  • Search Engines’ Trustworthiness-Current Status

    Hayato YAMANA

    Proc. of the 5th Korea-Japan Database Workshop 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Time-weighted web authoritative ranking

    Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    Information Retrieval 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi AIKAWA, Tetsuya SAKAI, Hayato YAMANA

    WebDBForum2011 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Speed-Up of Resizable-LSH for Similarity-Based Range Query

    山崎邦弘, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • The Method of Improving the Specific Language Focused Crawler,

    Shan-Bin Chan, Hayato Yamana

    Proc. of the 1st CIPS-SIGHAN Joint Conf. on Chinese Language Processing(CLP2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Data Access Pattern Analysis on Stream Mining Algorithms for Cloud Computation,

    Sayaka Akioka, Hayato Yamana, Yoichi Muraoka

    Proc. of the 2010 Int'll Conf. on Parallel and Distributed Processing Techniques and Applications 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Localized Multiple Kernel Learningを用いた画像分類

    小林大輔, 相川直視, 山名早人

    MIRU2010, IS2-43 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 低解像度目画像からのModel-Based視線推定

    福田崇, 松崎勝彦, 山名早人

    MIRU2010, IS1-46 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 動画像における正面画像推定からの衣服領域抽出

    金正文, 森本浩介, 山名早人

    MIRU2010, IS3-36 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 領域分割と色特徴を利用したテンプレートマッチングによる類似物体検出

    新井啓介, 森本浩介, 山名早人

    MIRU2010,IS2-42 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 検索語の重みの2段階調整手法

    矢野博也, 中島泰, 山名早人

    信学技報 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Search Engines’ Trustworthiness-Current Status

    Hayato YAMANA

    Proc. of the 5th Korea-Japan Database Workshop 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Similar object detection using template matching focused on positional relationship of feature regions

    新井啓介, 森本浩介, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • データアクセスパターンに基づくデータマイニング手法の分類

    秋岡明香, 村岡洋一, 山名早人, 中島達夫

    第72回情処全大 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 安価なWebカメラを用いたModel-Based視線推定

    福田 崇, 松崎勝彦, 山名早人

    信学技報(PRMU) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Hit Count Dance -検索エンジンのヒット数に関する信頼性検証-

    舟橋卓也, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • LittleWeb: 類似ノード集約によるWebグラフ圧縮手法

    片瀬弘晶, 上田 高徳, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • QueueLinker: パイプライン型アプリケーションのための分散処理フレームワーク

    上田 高徳, 片瀬 弘晶, 森本 浩介, 打田 研二, 油井 誠, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Unexpected and Interesting: 動画視聴サイトにおける発見性 を重視した動画推薦手法の提案

    中村 智浩, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • WWWにおけるP3Pコンパクトポリシーの利用状況に関する調査

    櫻井 宏樹, 高木 浩光, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Winnyネットワーク上を流通するコンテンツの傾向と分析

    打田 研二, 高木 浩光, 山崎 邦弘, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • アンカーテキストとリンク構造を用いた同義語抽出手法

    黒木 さやか, 立石 健二, 細見 格, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 字幕テキストの利用によるブログで引用されたテレビ番組の推定

    及川 孝徳, 中島 泰, 松崎 勝彦, 黒木 さやか, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • 特定言語Webページ収集のためのフォーカストクローラの性能改善手法

    詹 善斌, 山名 早人

    第2回データ工学と情報マネジメントに関するフォーラム(DEIM2010) 

    Presentation date: 2010

    Event date:
    2010
     
     
  • Time-weighted web authoritative ranking

    Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

    Information Retrieval 

    Presentation date: 2010

    Event date:
    2010
     
     
  • A Lock-free GCLOCK Page Replacement Algorithm

    Presentation date: 2009.12

    Event date:
    2009.12
     
     

     View Summary

    In this paper, we propose a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK. Concurrent access to the buffer management module is a major factor that prevents database scalability to processors. Therefore, we propose a non-blocking scheme for bufferfix operations that fix buffer frames for requested pages without locks by combining Nb-GCLOCK and a wait-free hash table. Our experimental results revealed that our scheme can obtain nearly linear scalability to processors up to 64 processors, although the existing locking-based schemes do not scale beyond 16 processors.

  • Prediction of GPCR ligands by 2-way prediction method

    Hiroto Hyakkoku, Minoru Sugihara, Makiko Suwa, Tsuyoshi Kato, Hayato Yamana, Wataru Fujibuchi

    IPSJ SIG technical reports  Information Processing Society of Japan (IPSJ)

    Presentation date: 2009.09

    Event date:
    2009.09
     
     

     View Summary

    G-protein coupled receptors (GPCRs) are important pharmacological targets and to predict unknown interactions between GPCRs and ligands is one of the most interesting topics in the current computational biology. However, ligands of many GPCRs are experimentally not identified yet and it is difficult to predict unknown ligands of GPCRs because of insufficiency of training data set. We have developed a 2-way prediction method based on the support vector machine. In this method, the prediction is performed by using both information of ligands and GPCRs and one can apply this method to the case...

  • Proposal of Word Salad Spam Detection Method using N-gram and Interrupted Collocations

    MORIMOTO Kosuke, KATASE Hiroaki, YAMANA Hayato

    情報処理学会研究報告. データベース・システム研究会報告  情報処理学会

    Presentation date: 2009.07

    Event date:
    2009.07
     
     

     View Summary

    インターネット上にウェブページが爆発的に増加し,インターネットから得られる情報が重要になっている.しかし,ウェブページの爆発的な増加につれてスパム行為を行うページも同様に増加し,インターネットから得られる情報の価値を下げている.スパム行為には様々な手法があるが,本論文では自動的に文章を生成するワードサラダに着目し,ワードサラダ型のスパムを効率的に検出する手法を提案する.ワードサラダ型スパムを検出するため,n-gram と離散型共起表現を用いてカルバック・ライブラー情報量に基づく文章のスコアを計算し,計算したスコアに基づき判定を行う.提案手法の評価実験を行った結果,既存手法と比較して F 値で 0.18 の性能の向上を確認できた.Information on the Internet becomes important because of exploding Web page. However, Spam pages also have exploded and information from the Internet have become lower reliability. Though there are many Spamming methods, in this article we focus on &quot;word salad&quot; that creates text automatically, and we propose the effective method of word salad detection. We detect word salad by the score based on Kullback-Leibler divergence calculated with n-gram and interrupted collocation. As a result of experiment, our method improves 0.18 points in F-value from the existing method.

  • Resizable-LSH : An Approximate Similarity Search Algorithm for Resizable Range-Search

    YAMAZAKI Kunihiro, NAKAMURA Tomohiro, FUNAHASHI Takuya, YAMANA Hayato

    情報処理学会研究報告. 情報学基礎研究会報告  情報処理学会

    Presentation date: 2009.07

    Event date:
    2009.07
     
     

     View Summary

    本稿では閾値を可変にした近似的な類似検索手法を提案する.近年,距離を用いた類似検索手法の 1 つとして,Locality-Sensitive Hashing (局所性鋭敏型ハッシング,LSH) による近似的な類似検索が注目されている.LSHは,「距離が近い入力同士は高い確率で衝突する」 特徴を持つハッシュ関数を用いたデータマッピング手法であり,高次元なデータに対しても高速に近傍検索を行うことができる.しかし LSH では,事前計算によって距離が近いデータ同士を同じハッシュ値にマッピングするため,検索時に類似度の閾値を変更することができない.閾値を変更するにはハッシュテーブルの再構築が必要になるため,ユーザが閾値を指定できるような類似検索は実現困難である.そこで本研究では,類似検索時に,クエリとハッシュ値が一致するデータに加え,ハッシュ値が近いデータも取得することで,ハッシュテーブルの再構築を行うことなく,閾値を指定できる類似検索を実現した.提案手法は,閾値に合わせてハッシュテーブルを逐次再構築する LSH と比較して,同程度の精度で,かつ 1,000 倍程度の高速化を達成できることを実験により確認した.We introduce an efficient algorithm named &quot;Resizable-LSH&quot; for approximate similarity search, which enables resizing the search range flexibly. Nowadays, Locality-Sensitive Hashing (LSH) is drawing attention as an efficient algorithm for approximate nearest neighbor search. LSH adopts hash functions that collide with high probability if two vectors are close, so that LSH finds approximate nearest neighbors quickly even if the dataset is high-dimensional. However, LSH should generate hash tables preliminarily, that results in resizing the search range costs expensive because hash table regeneration is required whenever we face the needs to resize search range. To solve the problem, our proposed Resizable-LSH retrieves not only the same hash value of query, but also near hash values. Then Resizable-LSH achieves resizable range-search. As it turns out, the result of the experiments shows Resizable-LSH works about 1,000 times faster than LSH with almost the same quality in comparison with LSH.

  • Resizable-LSH : An Approximate Similarity Search Algorithm for Resizable Range-Search

    YAMAZAKI Kunihiro, NAKAMURA Tomohiro, FUNAHASHI Takuya, YAMANA Hayato

    研究報告データベースシステム(DBS)  情報処理学会

    Presentation date: 2009.07

    Event date:
    2009.07
     
     

     View Summary

    本稿では閾値を可変にした近似的な類似検索手法を提案する.近年,距離を用いた類似検索手法の 1 つとして,Locality-Sensitive Hashing (局所性鋭敏型ハッシング,LSH) による近似的な類似検索が注目されている.LSHは,「距離が近い入力同士は高い確率で衝突する」 特徴を持つハッシュ関数を用いたデータマッピング手法であり,高次元なデータに対しても高速に近傍検索を行うことができる.しかし LSH では,事前計算によって距離が近いデータ同士を同じハッシュ値にマッピングするため,検索時に類似度の閾値を変更することができない.閾値を変更するにはハッシュテーブルの再構築が必要になるため,ユーザが閾値を指定できるような類似検索は実現困難である.そこで本研究では,類似検索時に,クエリとハッシュ値が一致するデータに加え,ハッシュ値が近いデータも取得することで,ハッシュテーブルの再構築を行うことなく,閾値を指定できる類似検索を実現した.提案手法は,閾値に合わせてハッシュテーブルを逐次再構築する LSH と比較して,同程度の精度で,かつ 1,000 倍程度の高速化を達成できることを実験により確認した.We introduce an efficient algorithm named &quot;Resizable-LSH&quot; for approximate similarity search, which enables resizing the search range flexibly. Nowadays, Locality-Sensitive Hashing (LSH) is drawing attention as an efficient algorithm for approximate nearest neighbor search. LSH adopts hash functions that collide with high probability if two vectors are close, so that LSH finds approximate nearest neighbors quickly even if the dataset is high-dimensional. However, LSH should generate hash tables preliminarily, that results in resizing the search range costs expensive because hash table regeneration is required whenever we face the needs to resize search range. To solve the problem, our proposed Resizable-LSH retrieves not only the same hash value of query, but also near hash values. Then Resizable-LSH achieves resizable range-search. As it turns out, the result of the experiments shows Resizable-LSH works about 1,000 times faster than LSH with almost the same quality in comparison with LSH.

  • Proposal of Word Salad Spam Detection Method using N-gram and Interrupted Collocations

    MORIMOTO Kosuke, KATASE Hiroaki, YAMANA Hayato

    研究報告情報学基礎(FI) 

    Presentation date: 2009.07

    Event date:
    2009.07
     
     

     View Summary

    インターネット上にウェブページが爆発的に増加し,インターネットから得られる情報が重要になっている.しかし,ウェブページの爆発的な増加につれてスパム行為を行うページも同様に増加し,インターネットから得られる情報の価値を下げている.スパム行為には様々な手法があるが,本論文では自動的に文章を生成するワードサラダに着目し,ワードサラダ型のスパムを効率的に検出する手法を提案する.ワードサラダ型スパムを検出するため,n-gram と離散型共起表現を用いてカルバック・ライブラー情報量に基づく文章のスコアを計算し,計算したスコアに基づき判定を行う.提案手法の評価実験を行った結果,既存手法と比較して F 値で 0.18 の性能の向上を確認できた.Information on the Internet becomes important because of exploding Web page. However, Spam pages also have exploded and information from the Internet have become lower reliability. Though there are many Spamming methods, in this article we focus on &quot;word salad&quot; that creates text automatically, and we propose the effective method of word salad detection. We detect word salad by the score based on Kullback-Leibler divergence calculated with n-gram and interrupted collocation. As a result of experiment, our method improves 0.18 points in F-value from the existing method.

  • Reliability Verification of Search Engines' Hit Count using Multi Query

    FUNAHASHI Takuya, SONE Hiroaki, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2009.07

    Event date:
    2009.07
     
     

     View Summary

    A number of studies have been using Search Engines&#039; hit count. The goal of these studies is to build applications for translation support or natural language processing support. These studies assume that the hit count is reliable. To identify the reliability of Search Engines&#039; hit count, we have challenged to verify. In the past, we verified hit count only using one keyword query. The contribution of this paper is to verify hit count using multi query keyword.

  • Efficient duplicated URL detection for web crawlers

    久保田 展行, 上田 高徳, 山名 早人

    DBSJ journal  日本データベース学会

    Presentation date: 2009.06

    Event date:
    2009.06
     
     
  • Extending ALT algorithm to use multiple landmarks

    MATSUNAGA TAKU, HIRATE YU, YAMANA HAYATO

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2009.01

    Event date:
    2009.01
     
     

     View Summary

    Recently, the ALT algorithm is proposed as a speed-up algorithm to compute shortest paths in general graph structures. The ALT algorithm offers a landmark based heuristic function to estimate distance in A* search Before computing shortest paths, the ALT algorithm computes distances between all nodes and landmarks, and stores them to prepared memory or storage space. However, as the number of landmarks increases, the required prepared space increases linearly. To solve this problem, in this paper, we propose a novel heuristic function for computing shortest paths in general graph structures...

  • The Challenge of Eliminating Storage Bottlenecks in Distributed Systems

    Takanori Ueda, Yu Hirate, Hayato Yamana

    FIRST INTERNATIONAL WORKSHOP ON SOFTWARE TECHNOLOGIES FOR FUTURE DEPENDABLE DISTRIBUTED SYSTEMS, PROCEEDINGS  IEEE COMPUTER SOC

    Presentation date: 2009

    Event date:
    2009
     
     

     View Summary

    One of the most difficult problems in distributed systems is load-balancing. Even if we take care of load-balancing, heavily-loaded nodes often occur while there are still lightly-loaded nodes that have idle memory and idle CPU power. Our idea is to exploit this idle memory and idle CPU power to improve the storage performance of heavily-loaded nodes. Idle memory can be used for caching file data and idle CPU power can be used for extracting file access patterns from file access logs. File access patterns are valuable sources for optimizing a cache strategy. Our project goal is to improve the overall performance of distributed systems by improving storage access performance. This paper gives an overview of this project idea and reports the current status of the project. In addition, we show benchmark results from our prototype cache extension system, which is implemented in Linux Kernel 2.6. The DBT-3 (TPC-H) benchmark results show that our system can increase computer speed by a factor of 6.68.

  • Profiling Node Conditions of Distributed System with Sequential Pattern Mining

    Yu Hirate, Hayato Yamana

    FIRST INTERNATIONAL WORKSHOP ON SOFTWARE TECHNOLOGIES FOR FUTURE DEPENDABLE DISTRIBUTED SYSTEMS, PROCEEDINGS  IEEE COMPUTER SOC

    Presentation date: 2009

    Event date:
    2009
     
     

     View Summary

    Recently, with wide-spread of distributed systems, distributed monitoring systems are needed to mange such systems. However, since monitoring architecture of distributed system faces a huge amount of log data which come from local computing nodes, information aggregation is fundamental scheme for monitoring distributed system. In this paper, we preset a novel approach for extracting computing node-condition profiles by using sequential pattern mining, which is one of data mining techniques. Extracted computing node condition profiles represent node condition patterns which are occurred in many computing nodes frequently. Thus, extracted profiles enable summarized distributed system conditions to be small sized and easy-understandable information.

  • A Scalable Monitoring System for Distributed Environments

    Sayaka Akioka, Junichi Ikeda, Takanori Ueda, Yuki Ohno, Midori Sugaya, Yu Hirate, Jiro Katto, Shigeki Goto, Yoichi Muraoka, Hayato Yamana, Tatsuo Nakajima

    FIRST INTERNATIONAL WORKSHOP ON SOFTWARE TECHNOLOGIES FOR FUTURE DEPENDABLE DISTRIBUTED SYSTEMS, PROCEEDINGS  IEEE COMPUTER SOC

    Presentation date: 2009

    Event date:
    2009
     
     

     View Summary

    The total amount of information to process or analyze is jumping sharply with the quick spread of computers and networks. Our project, «Highly scalable monitoring architecture for information explosion», develops a monitoring system allows observing systems, merging the system logs, and discovering intelligence to share. More concretely, the project builds the total system to maintain, optimize, and protect autonomically. This paper reports the outcomes of the project after first-half of the development period.The rest of the paper is organized as follows. Section 2 describes the concept and details of the monitoring system on a single node, and Section 3 addresses the aggregation of the collected information in distributed environments. Section 4 and Section 5 introduce applications of the monitoring systems. Section 6 summarizes the project and mentions future plans. © 2009 IEEE.

  • Resizable-LSH: An Approximate Similarity Search Algorithm for Resizable Range-Search

    山崎邦弘, 中村智浩, 舟橋卓也, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM) 

    Presentation date: 2009

    Event date:
    2009
     
     
  • QueueLinker: Distributed Producer/Consumer Queue Framework"

    上田 高徳, 片瀬 弘晶, 森本 浩介, 打田 研二, 山名早人

    WebDB Forum2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • A Lock-free GCLOCK Page Replacement Algorithm

    油井誠, 油井誠, 宮崎純, 植村俊亮, 加藤博一, 山名早人

    情報処理学会論文誌トランザクション(CD-ROM) 

    Presentation date: 2009

    Event date:
    2009
     
     
  • QueueLinker: Distributed Producer/Consumer Queue Framework"

    上田 高徳, 片瀬 弘晶, 森本 浩介, 打田 研二, 山名早人

    WebDB Forum2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • ウィキペディア記事閲覧回数の特徴分析

    曽根広哲, 山名早人

    Wikimedia Conference Japan 2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • Prediction of GPCR ligands by 2-way prediction method

    百石弘澄, 杉原稔, 諏訪牧子, 諏訪牧子, 加藤毅, 加藤毅, 山名早人, 藤渕航, 藤渕航

    情報処理学会研究報告(CD-ROM) 

    Presentation date: 2009

    Event date:
    2009
     
     
  • Proposal of Word Salad Spam Detection Method using N-gram and Interrupted Collocations

    森本浩介, 片瀬弘晶, 山名早人, 山名早人

    情報処理学会研究報告(CD-ROM) 

    Presentation date: 2009

    Event date:
    2009
     
     
  • ブログにおける話題語の出現理由の抽出と話題に関する詳細記事推薦

    中島泰, 黒木さやか, 櫻井宏樹, 山名早人

    第15回Webインテリジェンスとインタラクション研究会 

    Presentation date: 2009

    Event date:
    2009
     
     
  • 複数キーワードクエリに対する検索ヒット数の信頼性検証

    舟橋卓也, 曽根広哲, 山名早人

    信学技報 

    Presentation date: 2009

    Event date:
    2009
     
     
  • Webページ間の関連性の伝播を用いたWebコミュニティ抽出手法の評価

    飯村卓也, 平手勇宇, 山名早人

    DEIM2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • 印象語からの概念推定システム

    永井洋平, 黒木さやか, 山名 早人

    信学技報(Webインテリジェンスとインタラクション研究会) 

    Presentation date: 2009

    Event date:
    2009
     
     
  • 核となるアイテムセットによる頻出アイテムセット抽出数削減手法

    松崎勝彦, 平手勇宇, 山名早人

    DEIM2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • 検索ヒット数のクラスタリングを用いた補正手法の検討

    舟橋 卓也, 平手 勇宇, 山名 早人

    DEIM2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • 商用検索エンジンにランキングされたサイトのランク変動パターンの解析

    吉田泰明, 平手勇宇, 山名早人

    DEIM2009 

    Presentation date: 2009

    Event date:
    2009
     
     
  • Exploiting idle CPU cores to improve file access performance

    Takanori Ueda, Yu Hirate, Hayato Yamana

    Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC'09 

    Presentation date: 2009

    Event date:
    2009
     
     

     View Summary

    Many-core CPUs require many parallel computation tasks to reach their full potential because CPU cores become idle if they do not have enough computation tasks. How best to utilize a number of cores in many-core CPUs should be examined. In this paper, we propose exploitation of idle cores for improving file access performance. Idle cores are used to extract file access patterns from access logs and the extracted patterns are used to improve file cache efficiency by reordering the LRU (Least Recently Used) list based on the extracted patterns. Data mining techniques are used to extract access patterns to reduce computation overhead. Our method was evaluated by simulation and also implemented on Linux kernel 2.6.26 as a prototype system. In the simulation experiment, our method improved the cache-hit ratio up to 1.09% on DBT-2 (TPC-C) trace logs. Our prototype implementation on Linux improves DBT-2 performance up to 5.24% on a real machine. Copyright 2009 ACM.

  • Implementing and Evaluating Graph Engine for Large Scale Graphs

    MATSUNAGA Taku, KATASE Hiroaki, UEDA Takanori, KUBOTA Nobuyuki, MORIMOTO Kosuke, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2008.11

    Event date:
    2008.11
     
     
  • Improvement in speed and accuracy of multiple sequence alignment program prime

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    IPSJ Transactions on Bioinformatics  一般社団法人情報処理学会

    Presentation date: 2008.11

    Event date:
    2008.11
     
     

     View Summary

    Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. We have developed an MSA program PRIME, whose crucial feature is the use of a group-to-group sequence alignment algorithm with a piecewise linear gap cost. We have shown that PRIME is one of the most accurate MSA programs currently available. However, PRIME is slower than other leading MSA programs. To improve computational performance, we newly incorporate anchoring and grouping heuristics into PRIME. An anchoring method is to locate well-conserved regions in a given MSA as anchor points to reduce the region of DP matrix to be examined, while a grouping method detects conserved subfamily alignments specified by phylogenetic tree in a given MSA to reduce the number of iterative refinement steps. The results of BAliBASE 3.0 and PREFAB 4 benchmark tests indicated that these heuristics contributed to reduction in the computational time of PRIME by more than 60% while the average alignment accuracy measures decreased by at most 2%. Additionally, we evaluated the effectiveness of iterative refinement algorithm based on maximal expected accuracy (MEA). Our experiments revealed that when many sequences are aligned, the MEA-based algorithm significantly improves alignment accuracy compared with the standard version of PRIME at the expense of a considerable increase in computation time. © 2008 Information Processing Society of Japan.

  • Search Engines' Trustworthiness(<Special Issue>Trust Assessment of Web Information)

    Yamana Hayato

    Journal of Japanese Society for Artificial Intelligence  社団法人人工知能学会

    Presentation date: 2008.11

    Event date:
    2008.11
     
     
  • Reliability Verification of Search Engines' Hit Count

    FUNAHASHI Takuya, UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2008.09

    Event date:
    2008.09
     
     

     View Summary

    A number of studies have been using Search Engines&#039; hit count. The goal of these studies is to build applications for translation support or natural language processing support. These studies assume that the hit count is reliable. However, none of the studies have been verifide the reliability of Search Engines&#039; hit count. If the hit count is unreliable, studies using hit count become also unreliable. The purpose of this paper is to verify the reliability of Search Engines&#039; hit count. In this experiment, we used Search APIs provided by Google, Yahoo! Japan and Live Search. Furthermore, we r...

  • Web Community Extraction Method with Web Pages' Relevance Fowarding

    IIMURA Takuya, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2008.09

    Event date:
    2008.09
     
     

     View Summary

    To find information from a large collection of Web-pages, several methods for extracting Web communities are proposed. In the past studies, it succeeds in improving precision score by making a rule whether or not to include a certain Web page into a Web community strictly. However, recall score might worsen because the Web page that should be included in the Web community is not included. In this paper, we propose the Web community extraction method that can improve recall score without decreasing precision score. The method adds Web pages that have many links from/to the Web pages in a sam...

  • Dynamic I/O Optimization with Access Pattern Mining at OS Level

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2008.09

    Event date:
    2008.09
     
     

     View Summary

    Many-core CPU improves parallel performance but also raises problem of storage performance bottleneck. I/O optimization should be taken at operating system level because various applications are executed in parallel on many-core CPU environment and I/O optimization requires cross-cutting knowledge about applications. We propose a new method which uses disk access patterns for improving efficiency of disk cache replacement algorithm. Our method is now implemented at Linux 2.6.26 and extracts access patterns from file access logs of applications. The experimental results show our method impro...

  • Message from the MAW 2008 co-chairs

    Takahiro Hara, Yanchim Zhang, William K. Cheung, Shengrui Wang, Hayato Yamana, Km Fun Li, Laurence T. Yang

    Proceedings - International Conference on Advanced Information Networking and Applications, AINA 

    Presentation date: 2008.09

    Event date:
    2008.09
     
     
  • Analyzing geographical location and number of back-links of web servers all over the world

    平手 勇宇, 片瀬 弘晶, 山名 早人

    Journal of the DBSJ  日本データベース学会

    Presentation date: 2008.09

    Event date:
    2008.09
     
     
  • Measuring Editor's Trustworthiness in Wikipedia by Using Edit History without Depending on the Edit Frequency

    SAKURAI Hiroki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Free encyclopedia Wikipedia on the Internet contains 490,000 Japanese articles and its volume of information is huge and useful. However, the trustworthiness of the articles&#039; contents becomes uncertain because anyone can edit them easily. In other words, since we cannot understand exactly who edits the contents, it results in difficulty of measuring trustworthiness of the articles&#039; contents. To such a problem the method using the edit history is proposed for measuring the trustworthiness of the articles. However, it is inapposite compared with the article and the editor with a little freque...

  • Gathering Over 10 Billion of Web Pages and its Applications

    YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     
  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines&#039; results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using sea...

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Clustering of internet news articles makes it possible to detect various useful information, for example, related articles, and latest topic words. From the TDT project down, this area is widely researched. Conventional clustering methods have difficulties to detect single article as a single cluster even though many single articles exists. In this paper, we propose a method to cluster news articles that exclude single articles in advance by using proper noun information, topographic information and other characteristics between single and non-single articles. In evaluation, we use half a y...

  • Measuring Editor's Trustworthiness in Wikipedia by Using Edit History without Depending on the Edit Frequency

    SAKURAI Hiroki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Free encyclopedia Wikipedia on the Internet contains 490,000 Japanese articles and its volume of information is huge and useful. However, the trustworthiness of the articles&#039; contents becomes uncertain because anyone can edit them easily. In other words, since we cannot understand exactly who edits the contents, it results in difficulty of measuring trustworthiness of the articles&#039; contents. To such a problem the method using the edit history is proposed for measuring the trustworthiness of the articles. However, it is inapposite compared with the article and the editor with a little freque...

  • Gathering Over 10 Billion of Web Pages and its Applications

    YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    The number of Web pages distributed from Web servers is estimated about 53.7 billion as of Oct. 2005. We had gathered 14,456,201,906 Web pages from 5,548 Web servers during Jan. 2004 to July 2006. It had been conducted as part of e-Society project which is one of MEXT, Ministry of Education, Culture, Sports, Science and Technology, leading projects. Speedup of crawling Web pages conflicts with Web-site friendly crawling, however, both are indispensable for gathering Web pages. In the project, we have studied and proposed a dynamic delay adjustment scheme for accessing Web servers to prevent...

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines&#039; results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using sea...

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Clustering of internet news articles makes it possible to detect various useful information, for example, related articles, and latest topic words. From the TDT project down, this area is widely researched. Conventional clustering methods have difficulties to detect single article as a single cluster even though many single articles exists. In this paper, we propose a method to cluster news articles that exclude single articles in advance by using proper noun information, topographic information and other characteristics between single and non-single articles. In evaluation, we use half a y...

  • OS Level I/O Optimization in the Many-Core Era

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    近い将来,1つのチップに十数コアを搭載したメニーコア CPU が登場することは確実である.メニーコア環境下では,多くのアプリケーションが並列に動作するため,HDD が特に不得手とするランダムアクセスの頻度が増え,ストレージがますますボトルネックになると考えられる.そこで我々は,ストレージのボトルネックをソフトウェア的に軽減することを考えている.具体的には,アプリケーションのアクセスパターンを活用するディスクキャッシュ機構を Linux に実装し,実システムで評価することをひとつの目標にしている.ワークショップでは,これまでの研究概要と既存研究について述べると共に,最新の研究成果について述べ,今後の研究指針を示す.Many-core CPU which consists of some dozen cores in one package will definitely appear in the near future. In many-core environments, storage system will become bottlenecks since the random access to storage will increase because many applications will run in parallel. To meet this problem, we try to ease the storage bottlenecks by software method. Specifically, we try to implement a novel disk cache technique which exploits file access patterns of applications. The cache technique will be implemented on Linux Kernel and evaluated in real system. In this talk, we will show our research abstract and related works, and then show the latest results and the milestone of our research.

  • Geographical Location and Number of Back-Links of Web Servers All Over the World

    HIRATE Yu, KATASE Hiroaki, YAMANA Hayato

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    According to our investigation result in Oct. 2005, the number of Web pages all over the world is estimated 53.7 billion. We have investigated TLD distribution and Language Distribution of Web pages based on 10.7 billion Web page dataset. In this paper, as one of our Web statics investigation series, we analyzed three kinds of distribution based on 10.7 billion Web page dataset, distribution of geographical location of Web server, the number of virtual hosts per one Web server, and the number of back links, i.e. the value of in-degree, per one Web server. Our results show (1) about 95.5% of...

  • Measuring Editor's Trustworthiness in Wikipedia by Using Edit History without Depending on the Edit Frequency

    SAKURAI Hiroki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. DE, データ工学  The Institute of Electronics, Information and Communication Engineers

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Free encyclopedia Wikipedia on the Internet contains 490,000 Japanese articles and its volume of information is huge and useful. However, the trustworthiness of the articles' contents becomes uncertain because anyone can edit them easily. In other words, since we cannot understand exactly who edits the contents, it results in difficulty of measuring trustworthiness of the articles' contents. To such a problem the method using the edit history is proposed for measuring the trustworthiness of the articles. However, it is inapposite compared with the article and the editor with a little frequency to be edited. Therefore, in this paper, we propose the method for measuring editors' trustworthiness without depending on the edit frequency. The proposed method is based on the ratio where the edit remains the latest version of contents. Our evaluation shows that our proposed method evaluate the editor with high reliability highly, and the editor with low reliability lowly without depending on the edit frequency.

  • Geographical Location and Number of Back-Links of Web Servers All Over the World

    HIRATE Yu, KATASE Hiroaki, YAMANA Hayato

    IPSJ SIG Notes  Information Processing Society of Japan (IPSJ)

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    According to our investigation result in Oct. 2005, the number of Web pages all over the world is estimated 53.7 billion. We have investigated TLD distribution and Language Distribution of Web pages based on 10.7 billion Web page dataset. In this paper, as one of our Web statics investigation series, we analyzed three kinds of distribution based on 10.7 billion Web page dataset, distribution of geographical location of Web server, the number of virtual hosts per one Web server, and the number of back links, i.e. the value of in-degree, per one Web server. Our results show (1) about 95.5% of Web servers are located in North America, Europe, and Asia regions, (2) hosts located in Latain America and East Europe have a large number of virtual hosts, and (3) the distribution between the value of in-degree and the number of Web servers follow the power low.

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解  The Institute of Electronics, Information and Communication Engineers

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Clustering of internet news articles makes it possible to detect various useful information, for example, related articles, and latest topic words. From the TDT project down, this area is widely researched. Conventional clustering methods have difficulties to detect single article as a single cluster even though many single articles exists. In this paper, we propose a method to cluster news articles that exclude single articles in advance by using proper noun information, topographic information and other characteristics between single and non-single articles. In evaluation, we use half a year Japanese news articles. Compared to the Single-Link Method, which alone is difficult to judge articles single, our proposing method improves precision 10.2% and reduces the computation time to approximately a third.

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解  The Institute of Electronics, Information and Communication Engineers

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines' results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using search engines' APIs. Among all articles in the Japanese version of Wikipedia, about 90% of articles were ranked in top 10 by Yahoo! JAPAN and Google, and about 70% were ranked in top 10 by MSN. In the case of Yahoo! JAPAN and MSN, newly created articles in Wikipedia tend to appear in top 10 ranking at first, and keep their high rankings. The influence of Wikipedia toward search engine rankings was confirmed by our experiments.

  • Temporal Clustering of Internet News Articles with Excluding Single Articles

    NAKAMURA Tomohiro, HIRANO Takayoshi, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. DE, データ工学 

    Presentation date: 2008.06

    Event date:
    2008.06
     
     
  • OS Level I/O Optimization in the Many-Core Era

    UEDA Takanori, HIRATE Yu, YAMANA Hayato

    IPSJ SIG Notes  一般社団法人情報処理学会

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    近い将来,1つのチップに十数コアを搭載したメニーコア CPU が登場することは確実である.メニーコア環境下では,多くのアプリケーションが並列に動作するため,HDD が特に不得手とするランダムアクセスの頻度が増え,ストレージがますますボトルネックになると考えられる.そこで我々は,ストレージのボトルネックをソフトウェア的に軽減することを考えている.具体的には,アプリケーションのアクセスパターンを活用するディスクキャッシュ機構を Linux に実装し,実システムで評価することをひとつの目標にしている.ワークショップでは,これまでの研究概要と既存研究について述べると共に,最新の研究成果について述べ,今後の研究指針を示す.Many-core CPU which consists of some dozen cores in one package will definitely appear in the near future. In many-core environments, storage system will become bottlenecks since the random access to storage will increase because many applications will run in parallel. To meet this problem, we try to ease the storage bottlenecks by software method. Specifically, we try to implement a novel disk cache technique which exploits file access patterns of applications. The cache technique will be implemented on Linux Kernel and evaluated in real system. In this talk, we will show our research abstract and related works, and then show the latest results and the milestone of our research.

  • Influence of Wikipedia on Search Engine Rankings

    SONE Hiroaki, YOSHIDA Yasuaki, HIRATE Yu, YAMANA Hayato

    電子情報通信学会技術研究報告. DE, データ工学  The Institute of Electronics, Information and Communication Engineers

    Presentation date: 2008.06

    Event date:
    2008.06
     
     

     View Summary

    Search engines are necessary to find information from the Internet. There is an investigation report that users recognize that information from search engines' results is as believable as information from television. That is, we can understand a part of the influence which a certain site gives to the society by examining search engine rankings. In this paper, to examine the influence of Wikipedia, that has become the pronoun of the encyclopedia in the Internet, we have conducted the experiments. We collected ranking positions of the articles in the Japanese version of Wikipedia by using search engines' APIs. Among all articles in the Japanese version of Wikipedia, about 90% of articles were ranked in top 10 by Yahoo! JAPAN and Google, and about 70% were ranked in top 10 by MSN. In the case of Yahoo! JAPAN and MSN, newly created articles in Wikipedia tend to appear in top 10 ranking at first, and keep their high rankings. The influence of Wikipedia toward search engine rankings was confirmed by our experiments.

  • 5L-1 全世界のWebページのTLD・言語分布解析(リーディングプロジェクト e-society:WebアーカイブとWebデータ解析技術,一般セッション,リーディングプロジェクト e-society)

    平手 勇宇, 山名 早人

    全国大会講演論文集  一般社団法人情報処理学会

    Presentation date: 2008.03

    Event date:
    2008.03
     
     
  • 3ZK-10 A System for Finding Shortest Paths Between Web Pages

    Matsunaga Taku, Hirate Yu, Yamana Hayato

    全国大会講演論文集  Information Processing Society of Japan (IPSJ)

    Presentation date: 2008.03

    Event date:
    2008.03
     
     

     View Summary

    According to our investigation result in Oct. 2005, the number of Web pages all over the world is estimated 53.7 billion. We have investigated TLD distribution and Language Distribution of Web pages based on 10.7 billion Web page dataset. In this paper, as one of our Web statics investigation series, we analyzed three kinds of distribution based on 10.7 billion Web page dataset, distribution of geographical location of Web server, the number of virtual hosts per one Web server, and the number of back links, i.e. the value of in-degree, per one Web server. Our results show (1) about 95.5% of...

  • プログラムコードの抽象化を利用した類似ソースコード検索システム

    黒木さやか, 上田高徳, 平手勇宇, 山名早人

    DEWS2008 

    Presentation date: 2008

    Event date:
    2008
     
     
  • リンク構造解析アルゴリズム高速化のための縮小Webリンク構造の構築

    片瀬弘晶, 松永拓, 上田高徳, 田代崇, 平手勇宇, 山名早人

    DEWS2008 

    Presentation date: 2008

    Event date:
    2008
     
     
  • 検索エンジンを用いた類似文章検索システムEPCI の評価

    田代崇, 上田高徳, 平手勇宇, 山名早人

    DEWS2008 

    Presentation date: 2008

    Event date:
    2008
     
     
  • 商用検索エンジンの検索結果では取得できないランキング下位部分の収集・解析

    舟橋卓也, 上田高徳, 平手勇宇, 山名早人

    DEWS2008 

    Presentation date: 2008

    Event date:
    2008
     
     
  • 全世界のWebサイトの言語分布と日本語を含むWebサイトのリンク・地理的位置の解析

    童 芳, 平手勇宇, 山名早人

    DEWS2008 

    Presentation date: 2008

    Event date:
    2008
     
     
  • 全世界のWebページのTLD・言語分布解析

    平手勇宇, 山名早人

    第70回情処全大 

    Presentation date: 2008

    Event date:
    2008
     
     
  • 評判情報における評価対象の性質や一部分を表す表現の高精度な抽出手法

    臼渕護, 平手勇宇, 山名早人

    言語処理学会第14回年次大会(NLP2008) 

    Presentation date: 2008

    Event date:
    2008
     
     
  • 分散メタP2Pストレージ「DiMPS」によるコンテンツ配信システムの実現

    岡本雄太, 山名早人

    DEWS2008 

    Presentation date: 2008

    Event date:
    2008
     
     
  • EReM-DiCE: Exploiting Remote Memory for Disk Cache Extension

    Takanori UEDA, Yu HIRATE, Hayato YAMANA

    Proc. of 1st International Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability (SPEED2008) 

    Presentation date: 2008

    Event date:
    2008
     
     
  • Sequential pattern mining with time intervals

    Yu Hirate, Hayato Yamana

    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS  SPRINGER-VERLAG BERLIN

    Presentation date: 2006

    Event date:
    2006
     
     

     View Summary

    Sequential pattern mining can be used to extract frequent sequences maintaining their transaction order. As conventional sequential pattern mining methods do not consider transaction occurrence time intervals, it is impossible to predict the time intervals of any two transactions extracted as frequent sequences. Thus, from extracted sequential patterns, although users are able to predict what events will occur, they are not able to predict when the events will occur. Here, we propose a new sequential pattern mining method that considers time intervals. Using Japanese earthquake data, we confirmed that our method is able to extract new types of frequent sequences that are not extracted by conventional sequential pattern mining methods.

  • Text Mining using PrefixSpan constrained by Item Interval and Item Attribute

    Issei Sato, Yu Hirate, Hayato Yamana

    ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops  Institute of Electrical and Electronics Engineers Inc.

    Presentation date: 2006

    Event date:
    2006
     
     

     View Summary

    Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals-The number of items between any two adjacent items in a sequence-And the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.

  • Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

    Shinsuke Yamada, Osamu Gotoh, Hayato Yamana

    BMC Bioinformatics 

    Presentation date: 2006

    Event date:
    2006
     
     

     View Summary

    Background: Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. Results: We propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee. Conclusion: PRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at http://prime.cbrc.jp/. © 2006 Yamada et al licensee BioMed Central Ltd.

  • Prediction of domain and disordered regions in proteins by fold recognition and secondary structure prediction

    Masatoshi Takizawa, Naoko Inoue, Kentaro Tomii, Hayato Yamana, Tamotsu Noguchi

    Critical Assessment of Techniques for Protein Structure Prediction Seventh Meeting 

    Presentation date: 2006

    Event date:
    2006
     
     
  • Automatic extraction of conserved region from alignment based on protein structure

    Shinsuke Yamada, Kouratou Yamada, Hayato Yamana, Tamotsu Noguchi

    EABS & BSJ 2006 

    Presentation date: 2006

    Event date:
    2006
     
     
  • Web Structure in 2005

    Yu Hirate, Hayato Yamana

    WAW2006, Banff 

    Presentation date: 2006

    Event date:
    2006
     
     
  • Contour Extraction using Texture and Non-Texture Distinction

    IGUCHI Shigeru, YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2005.11

    Event date:
    2005.11
     
     

     View Summary

    This paper proposes a technique for applying a suitable contour extraction method to a texture region and a non-texture region to improve the accuracy of the contour extraction after dividing an image into these two regions. The most basic idea to extract contours is edge detection by derivative filters, however, it is hard to say edges equal borderlines. Thus, a texture analysis is essential to get the accurate result. Most of the conventional studies apply either edge detection or texture analysis to the whole in an image. Against that, in this paper, we firstly extract a texture region a...

  • Sample Collection System for Online Handwritten Mathematical Expressions written by Digital Pen and Preliminary Recognition Experiments

    KASUYA Yuji, YAMANA Hayato

    Technical report of IEICE. PRMU  社団法人電子情報通信学会

    Presentation date: 2005.10

    Event date:
    2005.10
     
     

     View Summary

    This paper proposes a sample collection system for online handwritten mathematical expressions based on digital pens. In the prior online handwriting character recognition systems, samples collected by pen tablets have been used. But data by pen tablets are (1) difficult to collect because users aren&#039;t familiar with pen tablets, (2) different from real handwriting because users have to look at their monitors to write characters. On the contrary digital pens, easy to use for the first time, are used and samples written by 74 examinees are collected. By recognition experiments following facts...

  • C-013 A Consideration on Thread-Level Speculative Execution

    SAITO Fumiko, YAMANA Hayato

    情報科学技術フォーラム一般講演論文集  FIT(電子情報通信学会・情報処理学会)推進委員会

    Presentation date: 2005.08

    Event date:
    2005.08
     
     
  • Sequential Pattern Mining based on Event Intervals

    HIRATE YU, KOMATSU SHUNSUKE, YAMANA HAYATO

    IPSJ SIG Notes  Information Processing Society of Japan (IPSJ)

    Presentation date: 2005.07

    Event date:
    2005.07
     
     

     View Summary

    In data mining researches, sequential pattern mining extracts frequent sequences keeping their event occurrence orders. Since conventional sequential pattern mining methods have not consider event occurrence time intervals, it is impossible to understand time intervals of any two events which are included by result sequences. In this paper, we propose a new sequential pattern mining method which considers event occurrence time intervals. As a result of our evaluation in applying the earthquake data. We confirmed our new method can extract new kind of frequent sequences which couldn&#039;t extrac...

  • Sequential Pattern Mining based on Event Intervals

    HIRATE YU, KOMATSU SHUNSUKE, YAMANA HAYATO

    IPSJ SIG Notes  Information Processing Society of Japan (IPSJ)

    Presentation date: 2005.07

    Event date:
    2005.07
     
     

     View Summary

    In data mining researches, sequential pattern mining extracts frequent sequences keeping their event occurrence orders. Since conventional sequential pattern mining methods have not consider event occurrence time intervals, it is impossible to understand time intervals of any two events which are included by result sequences. In this paper, we propose a new sequential pattern mining method which considers event occurrence time intervals. As a result of our evaluation in applying the earthquake data. We confirmed our new method can extract new kind of frequent sequences which couldn&#039;t extracted by conventional sequential pattern mining methods.

  • Sequential Pattern Mining based on Event Intervals

    HIRATE Yu, KOMATSU Shunsuke, YAMANA Hayato

    IEICE technical report. Data engineering  社団法人電子情報通信学会

    Presentation date: 2005.07

    Event date:
    2005.07
     
     

     View Summary

    In data mining researches, sequential pattern mining extracts frequent sequences keeping their event occurrence orders. Since conventional sequential pattern mining methods have not consider event occurrence time intervals, it is impossible to understand time intervals of any two events which are included by result sequences. In this paper, we propose a new sequential pattern mining method which considers event occurrence time intervals. As a result of our evaluation in applying the earthquake data. We confirmed our new method can extract new kind of frequent sequences which couldn&#039;t extrac...

  • From the Search Engine to the Analysis Engine

    Yamana Hayato

    Journal of Japanese Society for Artificial Intelligence  社団法人人工知能学会

    Presentation date: 2005.07

    Event date:
    2005.07
     
     
  • TF2P-growth:Frequent Itemset Mining Algorithm without Any Thresholds

    HIRATE YU, IWAHASHI EIGO, YAMANA HAYATO

    情報処理学会論文誌データベース(TOD)  一般社団法人情報処理学会

    Presentation date: 2005.06

    Event date:
    2005.06
     
     

     View Summary

    Conventional frequent itemset mining algorithms require some user-specified minimum support, and then mine frequent itemsets with support values that are higher than the minimum support. As it is difficult to predict how many frequent itemsets will be mined with a specified minimum support, the Top-κ mining concept has been proposed. The Top-κ Mining concept is based on an algorithm for mining frequent itemsets without a minimum support, but with the number of most κ frequent itemsets ordered according to their support values. However, the Top-κ mining concept still requires a threshold κ. Therefore, users must decide the value of κ before initiating mining. In this paper, we propose a new mining algorithm, called &quot;TF^2P-growth, &quot; which does not require any thresholds. This algorithm mines itemsets with the descending order of their support values without any thresholds and returns frequent itemsets to users sequentially with short response time.

  • The Current Status of the Art of the 21st COE Programs in the Information Sciences Field (1) Productive ICT Academia Project

    上田和紀, 大石進一, 甲藤二郎, 中島達夫, 村岡洋一, 山名早人

    情報処理  Information Processing Society of Japan (IPSJ)

    Presentation date: 2005.04

    Event date:
    2005.04
     
     
  • 10. Productive ICT Academia Project

    UEDA Kazunori, OISHI Shinichi, KATTO Jiro, NAKAJIMA Tatsuo, MURAOKA Yoichi, YAMANA Hayato

    Journal of Information Processing Society of Japan  一般社団法人情報処理学会

    Presentation date: 2005.04

    Event date:
    2005.04
     
     
  • Defense against Buffer Overflow by Segmenting Stack Frame

    HIRUTA Tomonori, YAMANA Hayato

    Information Processing Society of Japan (IPSJ)

    Presentation date: 2005.03

    Event date:
    2005.03
     
     

     View Summary

    In recent years, Buffer overflow Attacks are increasing. Buffer overflow is caused by inputting larger data than date space which prepared for various numbers. The most danger buffer overflow is stack overflow. When Stack Overflow occurs, return address is re-written and malicious code becomes executable. This paper proposes defense technique against buffer overflow by segmenting stack frame. We implement this technique Simplescalar tool set ver3.0d and evaluate with SPEC CINT95. Evaluation result shows that this technique causes 2.3% performance degradation.

  • Defense against Buffer Overflow by Segmenting Stack Frame

    Hiruta Tomonori, Yamana Hayato

    情報処理学会研究報告. SLDM, [システムLSI設計技術]  一般社団法人情報処理学会

    Presentation date: 2005.03

    Event date:
    2005.03
     
     

     View Summary

    In recent years, Buffer overflow Attacks are increasing. Buffer overflow is caused by inputting larger data than date space which prepared for various numbers. The most danger buffer overflow is stack overflow. When Stack Overflow occurs, return address is re-written and