Updated on 2025/03/13


Faculty of Science and Engineering, School of Creative Science and Engineering
Job title
Research Associate

Research Experience

  • 2024.04

    Waseda University

Professional Memberships





Research Areas

  • Intelligent informatics / Statistical science / Social systems engineering

Research Interests

  • データ分析

  • 経営工学

  • 機械学習

  • データサイエンス


  • Best Paper Award, The 19th ANQ Congress 2021


    Winner: Yuki Tsuboi, Yuta Sakai, Satoshi Suzuki, Masayuki Goto



  • Bayesian Optimization Method for Business Policy Decisions Considering Input-dependent Variance

    Taiga Yoshikawa, Yuta Sakai, Tianxiang Yang, Masayuki Goto

    Journal of Japan Industrial Management Association   75 ( 2 ) 49 - 59  2024  [Refereed]

     View Summary

    In recent years, online shopping sites have implemented various business measures to improve profitability, including coupon issuance and point redemption. To optimize these measures and maximize profits, managers must set coupon discount amounts and point redemption amounts. One approach to solving this problem is to use machine learning to estimate a function with the inputs of business measures and the output of outcome variables. However, the relationship between the input and the output is not known in advance, and there is no training data for estimating the function before a measure is implemented. However, since the purpose of a business is to make a profit, it is often difficult to conduct large-scale experiments on real businesses where the only purpose is to acquire such data. On the other hand, Bayesian optimization is attracting attention as a method for performing sequential optimization of input while sequentially adding training data to an unknown function. Bayesian optimization estimates the posterior distribution of the output from the training data and uses an evaluation index called the acquisition function to estimate the next data point that will optimize the input. However, ordinary Bayesian optimization may not produce appropriate results when applied to practical business because it does not consider the characteristics of business effects, such as differences in variance depending on the input. Therefore, this study proposes a new acquisition function for Heteroskedastic Gaussian Process (hetGP), a function estimation method with different noise variances, that can consider the unique circumstances of business measures. This paper uses artificial data to demonstrate that the proposed method can effectively optimize business policies, even for functional data with input-dependent error variance that has not been handled by Bayesian optimization before. This method can enable regular optimization of business measures.



  • Multiple treatment effect estimation for business analytics using observational data

    Yuki Tsuboi, Yuta Sakai, Ryotaro Shimizu, Masayuki Goto

    Cogent Engineering   11 ( 1 )  2024  [Refereed]

     View Summary

    To correctly evaluate the effects of treatments, conducting randomized controlled trials (RCTs) is a reasonable approach. However, because it is generally difficult to implement RCTs for all treatments, methods to estimate the treatment effects using observational data have been actively studied and used in various decision-making processes. Observational data accumulated in business activities and elsewhere contains the results of various previously implemented treatments, and correctly estimating the effects of any given treatment without separating the impacts of other treatments is challenging. Against this background, this paper proposes a method to estimate the effects of multiple treatments of various types while considering various causal relationships. Specifically, the proposal is a variational inference method that estimates the effect of multiple treatments using four latent factors estimated from observations, making assumptions that are independent of the type and number of treatments. The proposed method makes it possible to appropriately estimate the effects of measures even in situations with complex causal relationships. In addition, in situations where measures with continuous parameters are being implemented, it is possible to estimate the effects of measures that have not been implemented in the past by treating the content of the measures as a continuous variable.



  • A Semi-Supervised Learning Model for Predicting User Attributes Based on Ladder Network

    Mizuki TAKEUCHI, Taichi IMAFUKU, Yuta SAKAI, Masayuki GOTO

    Total Quality Science   9 ( 2 ) 109 - 120  2023.12  [Refereed]  [Invited]


  • Purchasing Behavior Analysis Model that Considers the Relationship Between Topic Hierarchy and Item Categories

    Yuta Sakai, Yui Matsuoka, Masayuki Goto

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   13316 LNCS   344 - 358  2022  [Refereed]

     View Summary

    With the spread of EC sites, it has become an important work for companies to analyze user preferences contained in accumulated purchase history data and utilize them in marketing measures. A topic model is well known as a method for analyzing user preferences from purchase history data, and a model assuming hierarchy of topics has been proposed as an extension method. The previously proposed PAM (Pachinko Allocation Model) is a highly expressive model in which all upper and lower topics are connected by a network and the relationships between multiple topics can be analyzed. However, PAM is easily affected by the initial values of learning parameters, and it is difficult to obtain stable topics, so the interpretation of the estimated topics becomes unstable. It is dangerous to make business decisions based on the interpretation of such unstable results. Therefore, in this research, instead of using the hierarchy of topics estimated based on the user’s purchasing behavior, we use information with a hierarchical structure of “product categories” given by the EC site for managing items. Therefore, we propose a method that is useful for studying measures and that enables hierarchical topic analysis. Finally, the proposed method is applied to the evaluation history data of the actual EC site to analyze the user’s preference and show its usefulness.



  • An Extension of Semi-supervised Boosting to Multi-valued Classification Problems

    Yuta Sakai, Kazuki Yasui, Kenta Mikawa, Masayuki Goto

    Total Quality Science   6 ( 2 ) 60 - 69  2021.02  [Refereed]  [Invited]


▼display all


  • 不均衡な結果変数を伴う中古ファッションアイテムのフラッシュセールにおけるCATE推定

    木村 恵悟, 阪井 優太, 後藤 正幸, 佐々木 北都

    日本計算機統計学会シンポジウム論文集   38   94 - 97  2024.10

  • 顧客の閲覧頻度と嗜好多様性を考慮したクラスタリングモデル

    松岡 龍汰, 阪井 優太, 後藤 正幸, 山下 遥

    日本計算機統計学会シンポジウム論文集   38   128 - 131  2024.10

  • 継続的メール配信の影響を考慮したメール開封要因分析モデル

    阪井 優太, 後藤 正幸, 清水 良太郎

    日本計算機統計学会シンポジウム論文集   38   54 - 57  2024.10

  • 学習データ選択を用いた中古スマートフォンの価格予測モデルの提案

    森川 卓哉, 阪井 優太, 後藤 正幸

    日本計算機統計学会シンポジウム論文集   38   78 - 81  2024.10

  • Hierarchical Multi-label Classification Model Adapted to Training Data with Different Layers of Correct Labels

    MIYAJIMA Kengo, NUNOME Yuto, SAKAI Yuta, GOTO Masayuki

    Proceedings of the Annual Conference of JSAI   JSAI2024   2B1GS202 - 2B1GS202  2024

     View Summary

    Multi-label classification in document data is the task of correctly assigning multiple class labels to each document. However, there is often a semantic hierarchical structure among the assigned labels, and considering the hierarchical structure can improve the accuracy of label prediction. The Multi-label Box Model (MBM) has been proposed as a multi-label classification model that takes into account the semantic hierarchical structure among labels, and its effectiveness has been demonstrated when class labels of all layers are assigned to training data. However, real-world document data posted on user-contributed websites often lack class labels for all layers of the hierarchy. If such data is used to train MBM, the accuracy of label prediction is reduced. In this study, we propose a framework for learning MBM after complementing labels of missing hierarchies by introducing Bidirectional Encoder Representations from Transformers (BERT). The effectiveness of the proposed method is also demonstrated through evaluation experiments, which compare the accuracy of the conventional method and the proposed method when applied to data with missing labels of some hierarchies.


  • 埋込空間を利用した顧客の購買行動とレビューの分析

    布目 悠人, 阪井 優太, 後藤 正幸

    日本計算機統計学会シンポジウム論文集   37   46 - 49  2023.11

  • Price Factor Analysis Model for Used Smartphone Products Based on Machine Learning

    森川卓哉, 竹内瑞生, 阪井優太, 後藤正幸

    人工知能学会全国大会論文集(Web)   37th   2L6GS304 - 2L6GS304  2023

     View Summary

    In recent years, more and more used smartphones have been bought and sold through online sales services in the used smartphone market, and it is desirable to utilize the large amount of transaction data accumulated in conjunction with these transactions when listing and purchasing used smartphones. Used equipment buyers can use this data to analyze market price trends and the factors that determine those prices, which can lead to optimal purchase strategies and pricing. However, used smartphones are handled by various sales services. For such a target problem, it would be possible to understand which factors contribute to the selling price if a prediction model could be constructed to explain the selling price based on various characteristics. In this study, we analyze price determinants using a model that incorporates the gradient boosting method, which is a model with high accuracy and interpretability, with the help of explainable AI. In this analysis, it is undesirable to apply a single pricing factor analysis model that could not consider differences in sales services, which has been the focus of previous analyses of the used equipment market. Therefore, we proposes an analytical model that employs the technique of explainable AI for the different price determinants among sales services. The proposed model is applied to analyze actual sales data of used smartphones and discuss the results.


  • Adversarial Training with Data Selection which Improves the Accuracy and Reduces the Computational Complexity of Domain Adaptation

    木村恵悟, 中村太祐, 阪井優太, 後藤正幸

    人工知能学会全国大会論文集(Web)   37th   2A6GS203 - 2A6GS203  2023

     View Summary

    In general, Machine Learning does not ensure the proper prediction if the statistical structures of the features between training data and prediction data are different, but it is sometimes required to apply a prediction model to a target which may have the different structure from its train data. In recent years, the studies to address this challenge have been actively conducted, and one of them is Adversarial Discriminative Domain Adaptation(ADDA), which is the classification model with adversarial training of Generative Adversarial Networks(GAN). ADDA has a defect that it uses all data from mini-batch, which includes bad data for training. In this study, we propose the improved method of ADDA with the application of GAN's related study, Top-k training. This application would enable ADDA to select useful data based on internal outputs, and the prediction accuracy is expected to increase. The result of the experiment shows that proposed method has significances of the accuracy and the length of training time.


  • A Study on Review Analysis Based on Using Query-focused Summarization Model

    中村太祐, 阪井優太, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   46th  2023


  • Adversarial Counter Factual Regression based on Importance Weighted Sampling

    今福太一, 阪井優太, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   46th  2023


  • A Study on Analysis Model for Action History Data Based on Self- and Semi-supervised Learning

    竹内瑞生, 阪井優太, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   46th  2023


  • A Study on High-dimensional Data Visualization Methods Based on Angles

    阪井優太, 三川健太, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   46th  2023


  • 選択データを用いた敵対的訓練によるドメイン適応に関する一考察

    木村 恵悟, 坪井 優樹, 阪井 優太, 後藤 正幸

    日本計算機統計学会シンポジウム論文集   36   44 - 47  2022.11

  • 複数の施策を対象とした処置効果推定手法に関する一考察

    坪井, 優樹, 阪井, 優太, 清水, 良太郎, 後藤, 正幸

    第84回全国大会講演論文集   2022 ( 1 ) 689 - 690  2022.02

     View Summary


  • A Study on Active Learning Research Trends and Issues in Regression / Classification Problems

    阪井優太, 小林学, 後藤正幸

    情報処理学会全国大会講演論文集   84th ( 2 ) 23 - 24  2022


  • A Relationship Analysis Model between Products based on Store Sales Data by Machine Learning Approach with SHAP Values

    石倉滉大, 阪井優太, 吉開朋弘, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   45th  2022


  • Multiple Treatment Effect Estimation for E-commerce Marketing Using Observational Data

    坪井優樹, 阪井優太, 清水良太郎, 清水良太郎, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   45th  2022


  • A Proposal of Business Decision-Making Model by Bayesian Optimization Considering Input-Dependent Dispersion

    良川太河, 阪井優太, YANG Tianxiang, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   45th  2022


  • An Analytical Model of Users’ Switching Possibilities by Using Predicted Time to Credit Card Users

    大久保亮吾, 阪井優太, 立花徹也, 長坂典香, 後藤正幸

    情報理論とその応用シンポジウム予稿集(CD-ROM)   45th  2022


  • 潜在的な関係性の違いを考慮した知識グラフによる推薦システムの一考察

    中村太祐, 阪井優太, 後藤正幸

    日本経営工学会秋季大会予稿集(Web)   2022  2022


  • A Study of Model Predicting User Attributes Based on Semi-supervised Learning by Ladder Network

    竹内瑞生, 今福太一, 阪井優太, 後藤正幸

    人工知能学会全国大会論文集(Web)   36th   1A4GS201 - 1A4GS201  2022

     View Summary

    In recent years, marketing using attribute information associated with member accounts of online services has been widely used. However, the majority of users are non-members who use services without registering for an account, and it is difficult to implement measures using attribute information for these non-member users. In order to deal with this situation, semi-supervised learning is an effective way to increase the number of users with attribute information by predicting it from the history data of member users who have attribute information, using the history data of non-member users as well. One of such semi-supervised learning methods is the Ladder Network, which is a neural network based model with adding and removing noise. This model provides highly accurate prediction for image data, and is also considered to be useful for predicting user attributes from historical data, where the feature vector is high-dimensional. However, this method cannot be applied to the case where the label takes ordered value, such as the user's age category. In this study, we propose an extended model based on the Ladder Network that incorporates a mechanism that can appropriately predict the user's attribute information. We also conduct an evaluation experiment using actual browsing history data to show the effectiveness of the proposed method.


  • 選択バイアスを考慮するCausal Treeによる条件付き平均処置効果推定手法

    坪井優樹, 阪井優太, 鈴木佐俊, 後藤正幸

    日本経営工学会春季大会予稿集(Web)   2021  2021


  • トピックの階層性を考慮した購買行動分析モデルに関する一考察

    松岡佑以, 平野洋介, 阪井優太, 後藤正幸

    日本経営工学会春季大会予稿集(Web)   2021  2021


  • トピックへの所属確率分布を考慮した学術論文へのキーワードの割り当て手法

    阪井優太, 浅見怜, 後藤正幸

    日本経営工学会春季大会予稿集(Web)   2021  2021


  • A Study on a Method for Estimating Conditional Average Treatment Effects Taking Account of Selection Bias Based on Causal Tree

    坪井優樹, 阪井優太, 鈴木佐俊, 後藤正幸

    人工知能学会全国大会論文集(Web)   35th   3G2GS2h04 - 3G2GS2h04  2021

     View Summary

    It is important for companies to verify the effects of their marketing measures and to make right decisions. To verify the effects from observational data correctly, they make use of causal inference. In recent causal inference, after allocating subjects to two groups and treating them differently, they often seek to estimate the Conditional Average Treatment Effect (CATE) to better understand causal mechanisms. CATE makes it possible to identify the group of users for whom it is effective to take measures. As a CATE estimation method, Causal Tree which has high interpretability and usefulness for analyzing the factors that affect measures, has been proposed. However, this method cannot be used when they allocate subjects to two groups on purpose due to selection bias. Therefore, we propose a method for estimating CATE taking account of selection bias based on Causal Tree. Finally, we evaluate the precision of CATE estimates by simulation experiments. In addition, we apply the proposed method to an actual data and show the usefulness of the proposed method.


  • Zero-shot Generative Model Considering Attribute Uncertainty

    阪井優太, 三川健太, 後藤正幸

    電子情報通信学会技術研究報告(Web)   120 ( 300(PRMU2020 38-68) )  2020


  • An Analytical Model of the Customer Purchase Factor Based on Conditional VAE Learned of Web Browsing Data

    川上達也, 阪井優太, 山下遥, 後藤正幸

    人工知能学会全国大会論文集(Web)   34th   1I3GS204 - 1I3GS204  2020

     View Summary

    Due to the accumulation of browsing history data on EC sites, Web marketing techniques are of growing significance. Most previous studies analyzed differences in overall browsing pages between purchasing and non-purchasing sessions by constructing a discriminative model and proposed measures for all users. However, it is difficult to utilize this model when considering personalized measures for each user. In this situation, a generative model, which infers browsing-behavior conditioned by whether a user purchases or not, is effective. Conditional VAE infers the data from the label and features of input data. In this paper, we apply Conditional VAE to browsing history data and identify important pages by generating a pseudo session assuming that a user in a non-purchasing session purchases. We propose a method to analyze important browsing pages that contribute to each user's purchase. We clarify the effectiveness of our proposed method by using real browsing history data.


  • Construction of Demand Forecast Model of Tokyo Taxi Based on Probe Data Analysis

    飯塚玲夫, 小野雄生, 野中賢也, 阪井優太, 後藤正幸

    人工知能学会全国大会論文集(Web)   34th   2I5GS204 - 2I5GS204  2020

     View Summary

    We construct a decision support model that can help taxi drivers dispatch their vehicles appropriately based on machine learning by utilizing probe data of taxis in Tokyo. Traditionally, taxi dispatch has relied on the driver's experience and intuition. The number of customers depends on the knowledge gained through many years of experience. For examples, many taxis are sometimes waiting for customers, and sometimes many customers wait in a long queue for a taxi. In addition, there are differences in the transport distance depending on the location. However, in a given situation, not all drivers know places with high demand. Therefore, it is desirable to build a model, which is easy to understand for drivers, that enables efficient acquisition of customers regardless of their experience. In this situation, we propose an analysis model that supports driver's decision based on taxi probe data.


  • An extension method of semi-supervised boosting to multiclass classification

    阪井優太, 安井一貴, 三川健太, 後藤正幸

    人工知能学会全国大会論文集(Web)   33rd   4A3J105 - 4A3J105  2019

     View Summary

    Recent years, semi-supervised learning which classifies test data into correct category using not only training (labeled) data but a large number of unlabeled data has paid attention. However, in the semi-supervised learning setting, there is a problem that classification accuracy deteriorates because distribution of labeled data is biased. The SemiBoost is one of semi-supervised learning method to solve the problem. The SemiBoost is a binary classification method. However, this method can not be extended directly to multi-class classification. In this research, we propose the way to extend the SemiBoost for multi-class classification using the concept of Error Correcting Output Code (ECOC) method. To verify the effectiveness of our proposed method, we conduct simulation experiment using UCI machine learning repository.


▼display all


Internal Special Research Projects

  • 機械学習モデルに基づくECマーケティング施策の効果検証に関する研究

    2024   後藤正幸, 三川健太

     View Summary

    近年多くのユーザを抱えるECサイトを運営する企業では,様々なビジネス施策を講じている。そのビジネス施策を講じた過去の観察データを用いて施策の効果検証を行う統計的因果推論が盛んに活用されてきている。しかしながら,一般的な因果推論で扱う施策効果の推定においては単一の施策効果に着目し推定するものであるが,実応用上では複数の施策の効果を考慮し類似した特徴を持つユーザ群やユーザごとに推定することが好ましい。そこで本研究では,複数の施策を考慮しユーザごとに施策効果を推定可能なモデルを提案した。本研究において,実際のECサイトにおけるメール施策をおこなった観察データを用いて効果推定及び分析をおこなう。このメール施策には,施策を講じるために必要なコストが小さいのでユーザに対して高い頻度で送付されることや,プロモーション施策をおこないたい部署ごとに多様な種類のメールが混在しているといった特徴が存在する。このとき大量のメールが及ぼすユーザのメール開封への影響を分析するモデルを提案した。この研究により,メール施策はユーザの直近の開封行動以外に1週間や1か月前に送信した数が,メール開封率に影響を示すことがわかった。これにより,ユーザ行動に左右されない中でもメールの数量をコントロールする重要性を示した。この研究成果を日本計算機統計学会第38回シンポジウムで発表した。また,多様なカテゴリを持つデータにおける分析手法の提案もおこなった。この手法では様々なカテゴリのデータを低次元空間に縮約し可視化することで,カテゴリ間の類似性を視覚的に評価することをおこなっている。これはメール施策をはじめとしたユーザなどの属性情報をカテゴリとしてみなすことで,過去の行動特徴量をカテゴリ単位で可視化することができ,嗜好の多様性を理解するための一助となると考えられる。この研究成果をANQ Congress 2024で発表した。