SnowWhite: High Level Reasoning In Compilers

(DARPA HR0011-20-9-0018)

F. Franchetti (PI), J. C. Hoe (Co-PI), T.-M. Low (Co-PI), M. Franusich (Co-PI)

Overview

Since the inception of compiler research the Holy Grail has been to devise a system that provides high level abstraction (programmers express their intent as concisely as in an algorithms textbook), and an automatic system that translates these programs or specifications into executables targeting an ever-evolving landscape of platforms, extracting close-to-optimal performance on all these platforms. The original FORTRAN compiler got close to the goal (a necessity for its adoption) on machines of the day and for relatively simple programs. Unfortunately, ever-increasing hardware complexity has swept away this achievement and today we are farther away from the vision than ever. The SnowWhite effort addresses this problem aiming to sketch a potential path to a long-term solution. SnowWhite shows how program understanding beyond classical compiler analysis is key and requires a novel AI approach.

 
snowwhite-1

The prototype SnowWhite system was developed in the PAPPA program. The system is available under a BSD style permissible license on GitHub, and documented at https://github.com/spiral-software/python-package-snowwhite. At the core SnowWhite adds a new AI approach to compilers: It introduces high level reasoning to orchestrate the complex components and enables the systems to “understand” the computation much as human experts would do. Furthermore, SnowWhite utilizes a number of technologies that have proven essential: 1) domain-specific languages (DSLs), 2) the idea of telescoping languages (libraries as language components with known semantics), 3) just-in-time compilation (JIT), 4) automatic performance tuning (autotuning), and 5) program synthesis or program generation. The result is a feedback system that finds a close-to-optimal mapping of an entire application built from components drawn from multiple domains across a range of challenging target platforms.

System Description

The prototype system consists of the following components.


snowwhite-2
 
snowwhite-3

Input Language

SnowWhite’s input programs are single-threaded, single address space Python/NumPy programs that follow an object-oriented paradigm and are implemented relative to a SnowWhite class library. They concisely and cleanly implement the user’s algorithm as a high level program that acts as a specification. In fact, such a program is an executable specification for programs where the mathematical semantics of the used NumPy objects and functions is known. SnowWhite defines the mathematical semantics of a sufficient subset of NumPy to cover the target application domains (real-time processing and physical simulation) in machine-readable form as semantics definition modules.

Frontend

SnowWhite’s frontend is a Python-to-SPIRAL parser and analysis stage that converts the Python program fragments based on the SnowWhite object library and supported NumPy components to the SPIRAL high level input IR. The result is a SPIRAL script and expression that represents the Python program fragment and is the input to the formal reasoning system. The expression is then implemented through the SnowWhite system as a native library for Python that leverages the target’s high performance features such as GPUs and multiple nodes. The code which was originally implemented as sequence of NumPy calls is replaced by a call to the inserted native library. The SnowWhite analysis stage includes a sophisticated data flow analysis enabling inter-procedural semantic analysis and cross-call and cross-library optimization via an algorithm detection and promotion framework.


snowwhite-4
 

High-Level Reasoning System

SnowWhite introduces a new rule system for the core SPIRAL system. This component bootstraps the SPIRAL base for the target application domains based on the semantics definition of the mathematical library (NumPy), adding NumPy data structure and operation abstractions necessary for the target applications, and a general framework to add further functionality as needed. Further, a promotion rule system was added that detects well-known patterns that require multiple NumPy library calls but are logically a single mathematical operation. The prime example is to promote a sequence of FFT-pointwise multiplication-inverse FFT into a convolution operation that then can be re-expanded depending on the target platform. This in turn allows the SnowWhite via the SPIRAL rule system to reason about NumPy based programs across the range of SPIRAL supported hardware platforms, bringing it all together. The semantics definition of a set of NumPy components and the necessary Python object-oriented library to streamline user programs was a key effort in our PAPPA project.

Backend

The SnowWhite prototype system generates high performance native code for the target platform. A range of CPUs and GPUs in the context of massively parallel (distributed memory/MPI) systems were the main targets. SnowWhite leverages and extends the SPIRAL multi-target backend to support Intel x86 (with SSE/AVX/AVX512) and IBM’s POWEWR9 (with VSX) CPUs as well as Nvidia (CUDA) and AMD (HIP/ROCm) CPUs. This requires to manage multi-node and multi-address space Python programs to present a single address-space, single threaded abstraction to the user, while not paying too much overhead. While in PAPPA Python/NumPy was the input, this is an instance of the larger SnowWhite/SPIRAL infrastructure that supports a range of input languages. In particular, the DOE ExaScale effort FFTX, the DPRIVE effort NTTX [link to NTTX when available] and the GBTLX effort all utilize a C++ frontend similar to SnowWhite’s Python/NumPy frontend.


snowwhite-5
 

Example Library

As part of PAPPA and in collaboration with the FFTX DOE ExaScale Project we developed a set of examples implemented as small Python programs. They are based on SnowWhite’s object library and demonstrate how NumPy should be employed to cleanly convey the semantics of the algorithm to the system. Key is the simplicity and cleanness of the algorithm and its implementation. The goal is not to provide a performance-optimized, hard-to-understand implementation but the shortest possible and most concise implementation that captures the full complexity of the problem and algorithm without introducing performance optimization related artifacts that pollute the code. The SnowWhite package provides a range examples.

Documentation

The full SnowWhite system and all documentation is made available via spiral.net. This includes the core system, examples, documentation and scientific papers.