JCAHPC Seminar

Date: June 25, 2025（Wed）15:00-18:00
Venue: Room 315, 3rd Floor, Kashiwa Research Complex 2, Information Technology Center, The University of Tokyo / online

Registration

Schedule:

15:00-16:00	Designing High-Performance and Scalable Middleware for the Modern HPC and AI Era [Abstract]	DK Panda (Ohio State University)
16:00-16:30	Performance Evaluation of System-Allocated Memory on NVIDIA GH200 [Abstract]	Norihisa FUJITA （CCS, University of Tsukuba/JCAHPC）
16:30-16:40	Break
16:40-17:10	High Performance Center-Wide Heterogeneous Coupling Computing with WaitIO [Abstract]	Shinji SUMIMOTO （ITC, the University of Tokyo/JCAHPC）
17:10-17:40	Open discussion
17:40-18:00	Miyabi tour

Abstract

Designing High-Performance and Scalable Middleware for the Modern HPC and AI Era

This talk focuses on challenges and opportunities in designing middleware for HPC and AI (Deep/Machine Learning) workloads on modern high-end computing systems. The talk initially presents the challenges in co-designing HPC software by considering support for dense multi-core CPUs, high-performance interconnects, GPUs, and DPUs. Advanced designs and solutions (such as RDMA, in-network computing, GPUDirect RDMA, on-the-fly compression) to exploit novel features of these emerging technologies and their benefits in the context of MVAPICH libraries (http://mvapich.cse.ohio-state.edu) are presented. Next, the talk focuses on MPI-driven solutions for the AI (Deep/Machine Learning) domains to extract performance and scalability for popular Deep Learning frameworks, large out-of-core models, and GPUs. MPI-driven solutions to accelerate data science applications like Dask are also highlighted. The talk concludes with an overview of the activities in the NSF-AI Institute ICICLE (https://icicle.osu.edu/) to address challenges in designing future high-performance edge-to-HPC/cloud software for AI-driven data-intensive applications over the computing continuum.

Performance Evaluation of System-Allocated Memory on NVIDIA GH200

NVIDIA GH200 is a tightly coupled module equipped with a Grace CPU and a Hopper GPU with NVLink-C2C. NVLink-C2C connects the CPU and the GPU while maintaining cache coherence. GH200 provides a new unified memory, System Allocated Memory (SAM), which features memory migration between the two processors over the proprietary bus. In this talk, I will present a preliminary performance evaluation of the GH200 memory system on the Miyabi-G system including memory page migration and inter-node performance using InfiniBand and MPI.

High Performance Center-Wide Heterogeneous Coupling Computing with WaitIO

In this presentation, we will introduce h3-Open-SYS/WaitIO (abbreviated as WaitIO), a high-performance communication library that connects multiple MPI programs for heterogeneous coupled computing. WaitIO provides an inter-program communication environment between MPI programs and supports different MPI libraries corresponding to various interconnects and processor types. In this presentation, we will introduce the history of WaitIO development and examples of its application in related projects, including the JSC Collaboration Project and the JHPC-quantum Project. In the JHPC-quantum Project, we will also discuss the current development status of WaitIO-Router, which realizes center-wide communication among multiple systems, including the Quantum Computer on a wide-area network connected by SINET6.

biography

Prof. DK Pand

DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He is serving as the Director of the ICICLE NSF-AI Institute (https://icicle.ai). He has published over 500 papers. The MVAPICH MPI libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,450 organizations worldwide (in 92 countries). More than 1.9 million downloads of this software have taken place from the project's site. This software is empowering many clusters in the TOP500 list. High-performance and scalable solutions for Deep Learning frameworks and Machine Learning applications from his group are available from https://hidl.cse.ohio-state.edu. Similarly, scalable, and high-performance solutions for Big Data and Data science frameworks are available from https://hibd.cse.ohio-state.edu. Prof. Panda is a Fellow of ACM and IEEE. He is a recipient of the 2022 IEEE Charles Babbage Award and the 2024 IEEE TCPP Outstanding Service and Contributions Award. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.

Assistant Prof. Norihisa FUJITA

Norihisa Fujita received PhD degree from University of Tsukuba in 2016. From April 2016 to October 2019, he was a postdoctoral researcher at Center for Computational Science, University of Tsukuba, Japan. Since November 2019, he has been an assistant professor at Center for Computational Science, University of Tsukuba, Japan. His research interests include high-performance computing, accelerators, system architecture, high-speed networks, reconfigurable architecture and system software. He is a member of IEEE and IPSJ.

Prof. Shinji SUMIMOTO

Shinji Sumimoto is a project professor at the University of Tokyo's Supercomputing Research Division in 2022 after 36 years in industry. He developed PMv2 for SCore in the RWCP project (1997-2001), then moved to Fujitsu Laboratories, where he commercialized PC clusters such as the PC Riken Super Combined Cluster and the University of Tsukuba PACS-CS Cluster (2001-2006), and contributed to the K project to develop interconnects and system architectures. At Fujitsu, he led the development of Fujitsu MPI and FEFS (2007-2011), and led Arm HPC open source software activities as a senior architect for the successor to Fugaku (2012-2022). His research focuses on HPC system software and high-performance communications, covering processor architecture, communications hardware, Linux kernel, and MPI communications libraries. He has been involved in MPI standardization (3.0-4.1) since 2009. Sumimoto is participating in the heterogeneous computing "h3-Open-BDEC" project (2019-2023) and the JHPC-quantum project (2023-2028), and is developing h3-Open-SYS/WaitIO and its Router variant. Sumimoto received his Bachelor of Engineering from Doshisha University (1986) and his PhD in Engineering from Keio University (2000, PhD thesis).