Hiroyuki Kasai - Research interests


Optimization theory, machine learning, and signal processing (since 2013)

  • non-linear optimization, convex & non-convex optimization, and geometric (manifold) optimization

  • gradient descent and its variants

  • stochastic gradient descent and its variants

  • online learning and robust learning

  • matrix & tensor decomposition & completion & low-rank approximation

  • sparse modeling, sparse coding & dictionary learning

  • applications (communication network & its design, image & video processing, data analysis, control system, recommendation system, etc)

  • big-data analysis

Multimedia coding & delivery protocol & system (since 1996)

  • video coding & decoding, video transcoding, and its coding control

  • multimedia multiplexing

  • signaling & session protocol

  • multimedia terminal and system

  • transmission control

Recent works

Here are some recent works.

Riemannian optimization: overview

Let f:mathcal{M} rightarrow {R} be a smooth real-valued function on a Riemannian manifold mathcal{M}. We consider

min_{w in mathcal{M}} f(w),

where w in mathcal{M} is a given model variable. This problem has many applications; for example, in principal component analysis (PCA) and the subspace tracking problem on the Grassmann manifold. The low-rank matrix & tensor completion problem is a promising application concerning the manifold of fixed-rank matrices/tensors. The linear regression problem is also defined on the manifold of fixed-rank matrices. The independent component analysis (ICA) is an example defined on the oblique manifold.

This problem can be solved by Riemannian deterministic algorithms (the Riemannian steepest descent, conjugate gradient descent, quasi-Newton method, Newton's method, end trust-region (TR) algorithm etc.) or Riemannian stochastic algorithms mentioned below.

Riemannian stochastic gradient algorithms and theoretical convergence analysis  

We specifically consider

min_{w in mathcal{M}}  left{ f(w) := frac{1}{n} sum_{i=1}^n f_i(w) right},

where n is the total number of the elements, which is generally extremely large. A popular choice is the Riemannian stochastic gradient descent algorithm (R-SGD). As R-SGD calculates only {rm gradf}f_i(w) for the i-th sample, the complexity per iteration is independent of the size of n. However, R-SGD is hindered by a slow convergence rate. To this end, we propose the Riemannian stochastic variance reduced gradient algorithm (R-SVRG), which reduces the variance of the noisy stochastic gradient. R-SQN-VR has also recently been proposed, which achieves practical improvements for ill-conditioned problems. We also propose the Riemannian stochastic recursive gradient algorithm (R-SRG), which does not require the inverse of retraction between two distant iterates on the manifold. Convergence analyses of them are performed on both retraction-convex and non-convex functions under computationally efficient retraction and vector transport operations.

  • HK, H.Sato and B.Mishra, “Riemannian stochastic recursive gradient algorithm,” ICML2018, poster, (Short version: ICML workshop GiMLi2018).

  • HK, H.Sato and B.Mishra, “Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis,” AISTATS2018, arXiv2017.

  • H.Sato, HK and B.Mishra, “Riemannian stochastic variance reduced gradient algorithm,” arXiv2017 (Largely extended version of below).

  • HK, H.Sato and B.Mishra, “Riemannian stochastic variance reduced gradient on Grassmann manifold,” arXiv2016 and NeurIPS workshop OPT2016 paper.

Riemannian inexact trust-region algorithm and theoretical analysis  

The Riemannian trust-region algorithm (RTR) comes with a global convergence property, and a superlinear local convergence rate to a second-order optimal point. However, it is computationally prohibitive in a large-scale setting to handle big Hessian matrices. The proposed algorithm approximates the gradient and the Hessian in addition to the solution of a TR sub-problem. Addressing large-scale finite-sum problems, we specifically propose sub-sampled algorithms (Sub-RTR) with a fixed bound on sub-sampled Hessian and gradient sizes, where the gradient and Hessian are computed by a random sampling technique.

  • HK and B.Mishra, “Inexact trust-region algorithms on Riemannian manifolds,” NeurIPS2018 (formerly NIPS), poser.

Riemannian manifold geometry and its application to optimization and machine learning problems

  • Tucker tensor manifold with Riemmanian metric tuning and large-scale tensor learning (including MC)

    • HK and B.Mishra, "Low-rank tensor completion: a Riemannian manifold preconditioning approach,” ICML2016, arXiv2015 (Short version: NIPS workshop OPT2015).

Riemannian manifold distributed optimization

  • Distributed multi-task learning

    • B.Mishra, HK, P.Jawanpuria and A.Saroop, “A Riemannian gossip approach to decentralized subspace learning on Grassmann manifold,” Machine Learning2019, arXiv2017 (Extended version of arXiv2016 and NIPS workshop OPT2016).

Riemannian manifold optimization utilizations

  • Geometric low-rank learning

    • M.Bhutani, P.Jawanpuria, HK and B.Mishra, “Low-rank geometric mean metric learning,” ICML workshop GiMLi2018, arXiv2018.

  • Dictionary learning and sparse coding on Rimeannian manifold, and its applications

    • HK and B.Mishra, “Riemannian joint framework of discriminative dictionary learning and dimensionality reduction on symmetric positive definite manifolds,” EUSIPCO2018.

    • HK and K.Yoshikawa, “Sparse representation based classification on symmetric positive definite manifolds,” IEEE ISSPIT2017.

    • HK, “スパースコーディングの研究動向,” 2014.

  • Application to wireless system

    • HK, “Fast optimization algorithm for hybrid precoding in millimeter wave MIMO systems,” IEEE GlobalSIP2018, 2018.

Low-rank matrix and tensor approximation

  • Low-rank tensor approximation, online learning and its applications (video and image analysis, network traffic analysis, environmental data analysis, time-series data analysis, etc. )

  • Nonnegative (low-rank) matrix factorization (NMF) and its stochastic algorithms, and its applications (clustering, image noise reduction, etc.)

    • HK, “Accelerated stochastic multiplicative update with gradient averaging for nonnegative matrix factorizations,” EUSIPCO2018.

    • HK, “Stochastic variance reduced multiplicative update for nonnegative matrix factorization,” IEEE ICASSP2018, arXiv2017.

Stochastic optimization

alt text 

Convergence behaviors of some stochastic gradient algorithms.
(clickable to enlarge the animation.)