Affiliated Projects

NumFOCUS Affiliated Projects benefit from their association with NumFOCUS through access to community, certain funding opportunities, and promotion of the project through our network. NumFOCUS Affiliated Projects are scientifically oriented, open, and kind. (What does that mean?) Affiliated Projects are not fiscally sponsored by NumFOCUS.

Affiliated Projects enjoy a number of benefits. If your project is interested in becoming a NumFOCUS Affiliated Project, click here to learn more.

How to Apply

NumFOCUS is currently in the process of updating its financial and administrative systems. New project applications will not be considered until this work is complete. This page will be updated when a timeline for reopening applications is available.

In an effort to include more community input and involvement in our work, NumFOCUS has formed a committee around the selection process for our Affiliated Projects. This committee will be responsible for evaluating applications from open source projects for Affiliated Project status with NumFOCUS and working with applicant projects throughout the review process.

Affiliated Project Selection Committee Members

Florian Roscheck

Vice President

LinkedIn | GitHub

Paul Anzel

Secretary

Filipe Fernandes

Rocco Meli

Vyas Ramasubramani

Andre Leon Sampaio Gradvohl

Richard Gowers

Christopher Siefert

Mert Bozkir

Florian Rathgeber

GitHub

Sathvik Bhagavan

GitHub

Venkateshprasad Bhat

GitHub

Steven Kell

GitHub

Stefan Krastanov

GitHub

Saransh Chopra

GitHub

Aesara

Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

The project includes an extensible graph framework suitable for rapid development of custom operators and symbolic optimizations. Additionally, it implements an extensible graph transpilation framework that currently provides compilation via C, JAX, and Numba.

AiiDA

AiiDA is a workflow manager for computational science with a strong focus on provenance, performance and extensibility.

When executing a workflow, AiiDA records the provenance − calculations performed, codes used and data generated − in a directed acyclic graph tailored to provide full reproducibility of any given result.

Asteroid

Asteroid is a Pytorch-based audio source separation toolkit that enables fast experimentation on common datasets. It comes with a source code that supports a large range of datasets and architectures, and a set of recipes to reproduce some important papers.

Awkward Array

Awkward Array is a Python library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms.

Arrays are dynamically typed, but operations on them are compiled and fast. Their behavior coincides with NumPy when array dimensions are regular and generalizes when they’re not.

bqplot

Bqplot is a 2-D plotting library for Jupyter. Built upon the Jupyter widgets frameworks, it implements the grammar of graphics constructs.

Beyond plotting, bqplot is focused on using plots to take user inputs in a rich fashion, and using it in combination with other Jupyter interactive widgets to build applications.

CB-Geo MPM

CB-Geo MPM is an HPC-enabled Material Point Method solver for large-deformation modeling. It supports isoparametric elements to model complex geometries and creates photo-realistic rendering.

Catalyst

Catalyst is a PyTorch framework for Deep Learning Research and Development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop.

Clawpack

Clawpack (“Conservation Laws Package”) is a collection of finite volume methods for linear and nonlinear hyperbolic systems of conservation laws.

Colour

Colour is an open-source Python package providing a comprehensive number of algorithms and datasets for colour science. It is freely available under the New BSD License terms.

Crystal

Crystal is a statically-typed programming language that is super performant, yet friendly to humans.

Crystal boasts an expressive and intuitive syntax, drawing inspiration from Ruby while incorporating strong static typing and C-like performance. This combination allows developers to write clean and readable code while keeping the benefits of compile-time type checking and improved performance.

CVXPY

CVXPY is an open source Python-embedded modeling language for convex optimization problems.

It lets you express your problem in a natural way that follows the math, rather than in the restrictive standard form required by solvers.

Cython

Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.

Dash

Dash is a Python framework for building analytical web applications. No JavaScript required. Built on top of Plotly.js, React, and Flask, Dash ties modern UI elements like dropdowns, sliders, and graphs to your analytical Python code.

Data Retriever

The Data Retriever is a package manager for data. It downloads, cleans, and stores publicly available data, so that analysts spend less time cleaning and managing data, and more time analyzing it.

Devito

Devito is a Python package to implement optimized stencil computation (e.g., finite differences, image processing, machine learning) from high-level symbolic problem definitions.

Devito builds on SymPy and employs automated code generation and just-in-time compilation to execute optimized computational kernels on several computer platforms, including CPUs, GPUs, and clusters thereof.

DyND

DyND is a C++ library for dynamic, multidimensional arrays.

It is inspired by NumPy, the Python array programming library at the core of the scientific Python stack, but tries to address a number of obstacles encountered by some of its users. Examples of this are support for variable-sized string, ragged array types, and convenient usage from C++. The library is in a preview development state, and can be thought of as a sandbox where features are being tried and tweaked to gain experience with them.

Effective Quadratures

Effective Quadratures is an open-source library for uncertainty quantification, machine learning, optimisation, numerical integration and dimension reduction – all using orthogonal polynomials.

It is particularly useful for models / problems where output quantities of interest are smooth and continuous; to this extent it has found widespread applications in computational engineering models (finite elements, computational fluid dynamics, etc). It is built on the latest research within these areas and has both deterministic and randomized algorithms.

optimagic

optimagic is a Python package for nonlinear optimization. It particularly well suited to solve difficult problems with or without constraints. Additional core functionality includes statistical inference on estimated parameters

estimagic provides a unified interface to optimization algorithms from scipy, NlOpt, Pygmo, TAO, Cyipopt, and other Python packages. Adding new optimizers is easy and the long run goal is to support almost all optimizers with Python bindings. estimagic’s interface is familiar to anyone who has used scipy’s minimize function. At the same time, it is more powerful. Compared to using the underlying libraries directly, estimagic provides a lot of additional functionality. It adds statistical inference, sensitivity analyses, logging, error handling, multistart, and diagnostic tools, such as a realtime dashboard. A wide variety of data types are supported for the parameters being optimized, including numpy arrays, pandas objects, and nested dictionaries.

FluxML

Flux is 100% pure-Julia stack and provides lightweight abstractions on top of Julia’s native GPU and AD support. It makes easy things easy while remaining fully hackable and fast.

Flux is written to be very generic, so that users can easily add in custom code to perform specific tasks, and interplay with machine learning models easily, be that custom types, custom numbers, arrays, recursion, control flow etc. We aim to support the full gamut of tools that the Julia language has to offer.

FreeMoCap Project

The Free Motion Capture Project (FreeMoCap) aims to provide research-grade markerless motion capture software to everyone for free.

We’re building a user-friendly framework that connects an array of `bleeding edge` open-source tools from the computer vision and machine learning communities to accurately record full-body 3D movement of humans, animals, robots, and other objects. We want to make the newly emerging mind-boggling, future-shaping technologies that drive FreeMoCap’s core functionality accessible to communities of people who stand to benefit from them. We follow a “Universal Design” development philosophy, with the goal of creating a system that serves the needs of a professional research scientist while remaining intuitive to a 13-year-old with no technical training and no outside assistance. A high-quality, minimal-cost motion capture system would be a transformative tool for a wide range of communities – including 3d animators, game designers, athletes, coaches, performers, scientists, engineers, clinicians, and doctors. We hope to create a system that brings new technological capacity to these groups while also building bridges between them.

Gensim

Gensim is a Python library providing scalable statistical semantics, analysis of plain-text documents for semantic structure, and retrieval of semantically similar documents.

GeomScale

GeomScale is open-source project that lies at the intersection of data science, optimization, geometric and statistical computing.

It combines cutting-edge research efforts and results with state-of-the-art open source software tools for scientific computing , with the ambition to solve both research oriented and real-life problems.

GeomStats

Geomstats is an open-source Python package for computations and statistics on manifolds. The package is organized into two main modules: geometry and learning.

The module geometry implements concepts in differential geometry, and the module learning implements statistics and learning algorithms for data on manifolds.

GNU Radio

GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios.

It can be used with readily-available low-cost external RF hardware to create software-defined radios, or without hardware in a simulation-like environment. It is widely used in research, industry, academia, government, and hobbyist environments to support both wireless communications research and real-world radio systems.

Gonum

Gonum is a set of numeric and scientific libraries written for the Go programming language. Our primary aim was to build functionality similar to that of numpy + scipy and today we are close to achieving this goal.

Gridap

Gridap provides a rich set of tools for the grid-based approximation of partial differential equations (PDEs) written 100% in the Julia programming language.

HPX

HPX is a general-purpose C++ runtime system for parallel and distributed applications of any scale.

It offers comprehensive APIs for concurrency and parallelism, adhering to the standards defined by the C++ Standard, while also extending support for distributed computing. In addition, HPX actively contributes to standardization efforts by implementing functionalities proposed as part of the ongoing C++ standardization process.

igraph

igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use.

igraph is open source and free. igraph can be programmed in R, Python, Mathematica and C/C++.

ipyvizzu

ipyvizzu is a data visualization tool that empowers data scientists and analysts to employ animation as a means of storytelling with data using Python.

It allows users to create animated charts in Jupyter, Google Colab, Databricks, Kaggle, and Deepnote notebooks, among other platforms. Built on the open-source JavaScript/C++ charting library Vizzu, ipyvizzu leverages a unique morphing engine developed from scratch, which uses a single set of rules to describe all charts. This feature enables seamless transitions between different chart types. There is also a presentation extension to ipyvizzu, called ipyvizzu-story, that allows live presentation of animated data stories directly from notebooks, making it easier to share findings and engage audiences. Additionally, ipyvizzu and ipyvizzu-story are now embeddable in Streamlit and available in Panel, expanding their accessibility and integration options. By harnessing the power of animation, we aim to assist data scientists in effectively sharing their insights. To explore our tools’ capabilities, please visit our documentation sites (https://ipyvizzu.vizzuhq.com/latest/showcases/ & https://ipyvizzu-story.vizzuhq.com/latest/examples/) for examples and showcases.

Magpylib

Magpylib is a Python package for calculating 3D static magnetic fields of permanent magnets, currents and other sources.

The computation is based on analytical expressions and therefore extremely fast. A user friendly API combined with graphic output enables convenient positioning of sources and observers.

Manim

Manim is a community-maintained Python library for creating (mathematical) animations. With its simple, yet versatile interface, everyone is able to produce insightful visualizations.

For example, it allows linking mathematical formulas to colours and shapes, thus bringing them to life and making them easy to grasp. While the core focus is on animations in mathematics, physics and computer science, Manim has also been used to create animations in the context of biology, chemistry, and even music theory.

Micro-Manager

Micro-Manager is an open-source software for control and automation of microscope hardware.

It provides a generic API for hardware control of common microscope components (e.g. a camera) that can be configured to work with a large array of specific devices (e.g. a camera made by Thorlabs). There are several different sub-projects in different repos– the hardware control layer (MMCoreAndDevices), the GUI (Micro-Manager), a python package for scripting (Pycro-Manager), etc.

Mesa: Agent-Based Modeling In Python

Mesa is an Apache2 licensed agent-based modeling (or ABM) framework in Python.

MFEM

MFEM is a free, lightweight, scalable C++ library for finite element methods. Its goal is to enable high-performance scalable finite element discretization research and application development on a wide variety of platforms, ranging from laptops to supercomputers.

Neo

Neo is a Python package for working with electrophysiology data. It implements a hierarchical data model well adapted to intracellular and extracellular electrophysiology and EEG data.

NetKet

NetKet is a Toolbox to apply Machine-Learning techniques to Quantum Physics problems.

With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.

Numba

Numba gives you the power to speed up your applications with high performance functions written directly in Python.

ObsPy

ObsPy is an open-source project dedicated to provide a Python framework
for processing seismological data.

The goal of the ObsPy project is to facilitate rapid application
development for seismology.

It provides parsers for common file formats, clients to access data
centers and seismological signal processing routines which allow the
manipulation of seismological time series (see Beyreuther et al. 2010,
Megies et al. 2011, Krischer et al. 2015).

Orange

Open source data visualization and data analysis for novice and expert. Interactive workflows with a large toolbox.

Polars

Fast multi-threaded, hybrid-out-of-core query engine focussing on DataFrame front-ends. Among the host languages are Python, Rust, NodeJS, R and SQL.

poliastro

poliastro is an open source (MIT) collection of Python functions useful in Astrodynamics and Orbital Mechanics, focusing on interplanetary applications. It provides a simple and intuitive API and handles physical quantities with units.

pomegranate

pomegranate is a Python module for fast and flexible probabilistic modeling inspired by the design of scikit-learn.

A primary focus of pomegranate is to abstract away the intricacies of a model from its definition, allowing users to easily prototype with complex models and training strategies. Its modular implementation allows for probability distributions to be swapped in or out for each other with ease and for models to be stacked within each other, yielding such delights as a mixture of Bayesian networks or a Gaussian mixture model Bayes classifier.

Project Optuna

Project Optuna develops tools for optimizing deep learning and other tasks that use hyperparameters. Project Optuna is comprised of Optuna and Chainer.

Optuna is an open source hyperparameter optimization framework to automate hyperparameter search. Optuna provides eager search spaces for automated search for optimal hyperparameters using Python conditionals, loops, and syntax, state-of-the-art algorithms to efficiently search large spaces and prune unpromising trials for faster results, and easy parallelization for hyperparameter searches over multiple threads or processes without modifying code.

Chainer is a powerful, flexible, and intuitive deep learning framework, and other tools to automate machine learning in development as well, as part of its mission to simplify machine learning.

PSL

The Policy Simulation Library (PSL) is a collection of models and other software for public-policy decisionmaking. PSL is developed by independent projects that meet standards for transparency and accessibility.

The PSL community encourages collaborative contribution and makes the tools it develops accessible to a diverse group of users.

pvlib

pvlib python provides a set of functions and classes for simulating the performance of photovoltaic energy systems.

pyhf

pyhf is a pure-Python library for the building and serialization of statistical models used commonly in high energy particle physics.

It also supports statistical inference powered by n-dimensional array library computational backends, including machine learning libraries that allow for exploitation of automatic differentiation and hardware acceleration for speeding up model fitting.

pyiron

pyiron is an integrated development environment (IDE) for computational materials science. It enables scientists to upscale their workflows from rapid prototyping to high-performance computing.

PyLops

PyLops is a Python library which facilitates solving large-scale inverse problems.

It provides many commonly used linear operators (e.g. convolution, wavelet transform, etc.) as matrix-free objects, and leverages them within iterative algorithms to solve ill-conditioned problems. Its high-level, expressive interface resembles the underlying mathematical formulation. Finally, it supports fully interchangeable CPU and GPU backends that are compatible with other native Python libraries such as NumPy and CuPy.

pyMOR

pyMOR is a software library for building model order reduction applications with the Python programming language.

Implemented algorithms include reduced basis methods for parametric linear and non-linear problems, as well as system-theoretic methods such as balanced truncation or IRKA (Iterative Rational Krylov Algorithm). All algorithms in pyMOR are formulated in terms of abstract interfaces for seamless integration with external PDE (Partial Differential Equation) solver packages. Moreover, pure Python implementations of FEM (Finite Element Method) and FVM (Finite Volume Method) discretizations using the NumPy/SciPy scientific computing stack are provided for getting started quickly.

PySAL

PySAL is an open source cross-platform library for geospatial data science with an emphasis on vector data written in Python. It supports the development of high level applications for spatial analysis.

Python-graphblas

Python-graphblas is a foundation-layer library for sparse linear algebra.

Providing a Pythonic interface to compiled implementations of the GraphBLAS standard. It also provides I/O connectors to efficiently convert to and from common PyData primitives (numpy array, scipy.sparse array, NetworkX graph).

Python(X,Y)

Free scientific and engineering development software used for numerical computations, and analysis and visualization of data using the Python programming language

Python Satellite Data Analysis Toolkit (pysat)

pysat implements the general process of space science data analysis, from beginning to end, in an instrument-independent manner.

This toolkit uses an Instrument object that enables systematic but versatile analysis of science data from a variety of platforms within a single easy-to-use interface, abstracting away all of the tedious file, data handling, and processing issues. Basic functions such as downloading, loading, and cleaning are included for all supported instruments/data sets. While incubated in a space science environment, pysat is capable of processing the world’s data.

PyTorch-Ignite

PyTorch-Ignite is a high-level library to help with training neural networks in PyTorch

PyVista

PyVista is a helper module for the Visualization Toolkit (VTK) that takes a different approach on interfacing with VTK through NumPy and direct array access.

This package provides a Pythonic, well-documented interface exposing VTK’s powerful visualization backend to facilitate rapid prototyping, analysis, and visual integration of spatially referenced datasets.

QuTiP

QuTiP is a software for simulating quantum systems. QuTiP aims to provide tools for user-friendly and efficient numerical simulations of open quantum systems.

It can be used to simulate a wide range of physical phenomenon in areas such as quantum optics, trapped ions, superconducting circuits and quantum nanomechanical resonators. In addition, it contains a number of other modules to simplify the numerical simulation and study of many topics in quantum physics such as quantum optimal control, quantum information, and computing.

Radis

Radis is an open-source library to compute molecular spectra. It is used for in-the-lab emission and absorption spectroscopy diagnostics, and exoplanet research.

Radis is specifically designed to resolve millions of lines within seconds, and is compatible with the main spectroscopic databases (HITRAN, HITEMP, ExoMol). It also has some radiative-transfer capabilities, and non-LTE calculations.

scikit-bio

scikit-bio is an open-source, BSD-licensed, python package providing data structures, algorithms, and educational resources for bioinformatics.

signac

The signac framework is a complete solution for managing workflows operating on file-based data designed to scale to HPC systems.

By using a well-defined, indexable storage layout for data and metadata, signac streamlines generation of, access to, and analysis of data through a straightforward interface that naturally scales from laptops and workstations to leadership-class supercomputers. Additionally, operations on this data can be managed, parallelized, and easily submitted on supercomputing clusters.

The project has been published in the Journal of Computational Materials Science (DOI:10.1016/j.commatsci.2018.01.035) and the Proceedings of the SciPy 2018 conference (DOI:10.25080/Majora-4af1f417-016). It has also been presented at PyData Ann Arbor as well as eight scientific conferences in chemical engineering, materials science, and applied physics.

SkyPy

SkyPy is an open-source Python package for simulating the astrophysical sky. It comprises a library of physical and empirical models across a range of observables and a command-line script to run end-to-end simulations.

The library provides functions to sample realisations of sources and their associated properties from probability distributions. Simulation pipelines are constructed from these models using a YAML-based configuration syntax, while task scheduling and data dependencies are handled internally and the modular design allows users to interface with external software. SkyPy is developed and maintained by a diverse community of domain experts with a focus on software sustainability and interoperability. By fostering co-development, it provides a framework for correlated simulations of a range of cosmological probes including galaxy populations, large-scale structure, the cosmic microwave background, supernovae and gravitational waves.

Solcore

Solcore is a complete semiconductor solver able of modelling the optical and electrical properties of a wide range of solar cells, from quantum well devices to multi-junction solar cells.

Spack

Spack is a flexible package manager that builds multiple versions of packages for different configurations, platforms, and compilers. It was created to deploy large-scale scientific simulations on HPC systems, but it can deploy software on Linux and macOS machines, as well.

Statsmodels

Statsmodels is a Python package that provides a complement to Scipy for statistical computations including descriptive statistics and estimation of statistical models.

It features a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package. Furthermore, Spyder offers built-in integration with many popular scientific packages, including NumPy, SciPy, Pandas, IPython, QtConsole, Matplotlib, SymPy and more.

Taskflow

Parallel and heterogeneous programming with high performance and simultaneous high productivity

TNL - Template Numerical Library

TNL is an efficient C++ library providing many parallel algorithms and data structures for high-performance computing on GPUs, multicore CPUs and distributed clusters.

The goal is to create a unified interface that allows users to write single code that can be executed on different parallel architectures.

The Ibis Project

Ibis provides expressive analytics at any scale. It’s an library designed to help users be more productive when interacting with analytics databases and engines.

Trixi.jl

Trixi.jl is a numerical simulation framework for conservation laws written in the Julia programming language.

Example applications include high-speed flows with complex shock interactions, astrophysical simulations with self-gravity, shallow water problems for flood prediction, or computational aeroacoustics. A key objective for the framework is to be useful to both scientists and students. Therefore, next to having an extensible design with a fast implementation, Trixi.jl is focused on being easy to use for new or inexperienced users, including the installation and postprocessing procedures. We thus try to utilize the advantages of Julia for rapid prototyping and efficiency to make high-performance computing more accessible for a broader scientific audience.

WESTPA

WESTPA (The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis) is a high-performance Python framework

aeon

aeon is an open-source scikit-learn compatible toolkit for time series tasks such as forecasting, classification, regression, clustering, anomaly detection and segmentation. It provides a broad library of time series algorithms, including efficient implementations of the latest advances in research.

Yellowbrick

Yellowbrick is a Python package that visualizes the data science workflow, allowing users to visually steer the feature, algorithm, and hyperparameter selection process by directly extending the Scikit-Learn API.

XGI

The CompleX Group Interactions (XGI) library provides data structures and algorithms for modeling and analyzing complex systems with group (higher-order) interactions, i.e. hypergraphs and simplicial complexes.

BigBang

BigBang is a toolkit for studying processes of open collaboration and deliberation, especially with respect to the production of digital infrastructures, to make them more transparent and accountable. This is achieved by utilising public communication channels and documents to reveal which actors are leading, following, or left out. It enables the analysis and visualisation of relationships, discourses, time series and knowledge networks.

PyDMD

PyDMD is a Python package designed for Dynamic Mode Decomposition (DMD), a data-driven method used for analyzing and extracting spatiotemporal coherent structures from time-varying datasets. It provides a comprehensive and user-friendly interface for performing DMD analysis, making it a valuable tool for researchers, engineers, and data scientists working in various fields.

STUMPY

STUMPY is a powerful and scalable Python library for modern time series analysis.

skforecast

Skforecast is a Python library that eases using scikit-learn regressors as single and multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (LightGBM, XGBoost, CatBoost, …)

Parallel Ice Sheet Model (PISM)

The Parallel Ice Sheet Model (PISM) is an open-source modelling framework for ice sheets and glaciers. It is parallel, thermodynamically-coupled and capable of high resolution. PISM has been widely adopted as a tool for doing science for about twenty years now.

FESTIM

FESTIM, a leading open-source hydrogen transport simulation tool, relies on FEniCS. Globally adopted, it serves diverse sectors like nuclear design and hydrogen aviation advancements.

Open OnDemand

Developed by the Ohio Supercomputer Center (OSC) and funded by the National Science Foundation, Open OnDemand is an open-source portal that enables web-based access to HPC services. Clients manage files and jobs, create and share apps, run GUI applications and connect via SSH, all from any device with a web browser.

Visual Python

Visual Python is a GUI-based Python code generator, developed on the Jupyter Lab, Jupyter Notebook and Google Colab as an extension. Visual Python is an open source project started for students who struggle with coding during Python classes for data science.

GP Jax

GPJax is a Python library that enables Bayesian inference with Gaussian processes using JAX on both CPUs and GPUs. The abstractions provided by GPJax are designed to be tightly coupled with the underlying math, providing a framework that is intuitive to researchers and practitioners alike. Support for classification, regression, and decision making/Bayesian optimization are available thanks to work from numerous contributors.

toqito

The toqito package is an open-source library for studying various objects in quantum information, namely, states, channels, and measurements. toqito provides numerical tools to study problems about entanglement theory, nonlocal games, and other aspects of quantum information often associated with computer science.

Folium

Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in a Leaflet map via Folium.

Snakemake

The Snakemake workflow management system is a framework for reproducible and scalable data analyses. Workflows are described via a human readable, Python based language.

They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment. Finally, Snakemake can automatically generate server-free, graphical, interactive reports that connect the results of a data analyisis with the code, parameters, and software used for each step, ensuring data provenance and transparency. With on average more than 11 new citations per week in 2023, and almost 950000 downloads on anaconda.org, Snakemake is extremely popular.

SunPeek

SunPeek is a python package and web application for assesing the operational performance of Large Solar-thermal Arrays. It implements the ISO 24194 Performance Check standard, and is intended to assist researchers and plant operators in understanding the long term performance of these systems.

Bambi

Bambi is a high-level interface to build, fit, and explore Bayesian statistical models.

Open2C

Open2C is a community that develops and maintains open-source tools for 3D chromosome biology and genomic data science, primarily in Python.

We are particularly interested in 3D genomics, and most of the tools are focused on the analysis of data obtained using Hi-C and related high-throughput technologies. We like our tools to be easy to use, flexible, and scalable in order to facilitate active development of novel analytical approaches and to make use of the latest and largest datasets.

Poly

The aim of Poly is to provide a comprehensive software framework for engineering biology. Poly already ships a suite of parsers, optimizers, and various tools to engineer DNA and other biological sequences which we want to further expand to support more complex engineering workflows

PRQL

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

PUDL

PUDL is a data processing pipeline created by Catalyst Cooperative that cleans, integrates, and standardizes some of the most widely used public energy datasets in the US.

The data serve researchers, activists, journalists, and policy makers that might not have the technical expertise to access it in its raw form, the time to clean and prepare the data for bulk analysis, or the means to purchase it from existing commercial providers.

PyData Sparse

N-dimensional sparse arrays for the PyData Ecosystem

SPHinXsys

SPHinXsys provides C++ APIs for physical accurate simulation and aims to model coupled industrial dynamic systems including fluid, solid, multi-body dynamics and beyond.

The multi-physics library is based a unique, unified computational framework by which strong couplings have been achieved for all involved physics.

SageMath

SageMath is a comprehensive mathematical software system, developed since 2005.

Its scope ranges from general untyped symbolic computation to research-level computational tools in numerous areas of mathematics. Sage makes use of hundreds of third-party, separately maintained packages written either in Python/Cython or in other languages (C, C++, Common Lisp). The Sage library consists of about 3000 first-party Python and Cython modules.

BayesFlow

BayesFlow implements amortized Bayesian workflows with deep learning. This means users first train a neural network on simulated data. Then they obtain posterior inference on any real data almost instantly.

The Python library is based on Keras3 which allows users to choose between a PyTorch, TensorFlow, or JAX backend. BayesFlow follows a modular software architecture that is built for machine learning scientists and applied domain users alike.

HiGHS

HiGHS offers open-source high-performance linear optimization software. In the industry-standard independent benchmarks, it is seen to be the best such software in the world. It can be used through many language and application interfaces, including NumFOCUS projects SciPy and JuMP.

RxInfer

RxInfer is a Julia package for fast and scalable Bayesian inference in probabilistic models. This toolbox is very suited for fast online inference in freely definable non-linear state-space models.

sbi

Many areas of science and engineering make extensive use of complex, stochastic, numerical simulations to describe the structure and dynamics of the processes being investigated. A key challenge in simulation-based science is constraining these simulation models’ parameters with observational data.

Bayesian inference provides a general and powerful framework to invert the simulators, i.e. describe the parameters which are consistent both with empirical data and prior knowledge. In the case of simulators, a key quantity required for statistical inference, the likelihood of observed data given parameters, is typically intractable, rendering conventional statistical approaches inapplicable. `sbi` implements machine-learning methods that address this problem.

Narwhals

Lightweight compatibility layer between dataframe libraries

marimo

marimo is a reactive Python notebook: run a cell or interact with a UI element, and marimo automatically runs dependent cells (or marks them as stale), keeping code and outputs consistent.

marimo notebooks are stored as pure Python, executable as scripts, and deployable as apps. marimo is built from scratch, designed specifically for working with data.

Albumentations

Albumentations is a fast and flexible image augmentation library that supports over 70 different transforms for object detection, segmentation and image classification tasks.

It accelerates deep learning development by providing an efficient, hardware-optimized toolkit for data augmentation that’s readily compatible with popular ML frameworks like PyTorch and TensorFlow.

Materials Project Software Foundation

The mission of the Materials Project Software Foundation (MPSF) is to provide community-driven, inclusive, coordinated, transparent, and accountable governance of select public-facing and open-source Materials Project software packages.

These software packages constitute an ecosystem of complementary codes that together form the foundation of the Materials Project Database, while also enabling numerous capabilities in materials science, high-throughput computations, and analysis of molecular simulation results. MPSF supported codes are now used by a wide community of global researchers, as evidenced by nearly 2,000 GitHub stars and 1,000 forks. Our work ensures that these codes remain up-to-date, interoperable, and continue to serve the ever-evolving needs of this research community.

UXarray

UXarray aims to address the geoscience community’s need for tools that enable standard data analysis techniques to operate directly on unstructured grid data.

UXarray provides Xarray-styled functionality to better read in and use unstructured grid datasets that follow standard conventions, including UGRID, MPAS, ICON, SCRIP, ESMF, and Exodus grid formats. This effort is a result of the collaboration between Project Raijin (NSF NCAR and Pennsylvania State University) and the SEATS Project (Argonne National Laboratory, UC Davis, and Lawrence Livermore National Laboratory). The UXarray team welcomes community members to become part of this collaboration at any level of contribution.

BrainGlobe

The BrainGlobe Initiative exists to facilitate the development of interoperable Python-based tools for computational neuroanatomy.

We have three main aims. These are to develop core tools to help others build software, to develop specialist software ourselves, and to build a community of users and developers.

Project Pythia

Project Pythia is an education and training hub for the geoscientific Python community. We are also the education arm for the Pangeo initiative.

Project Pythia is a home for Python-centered learning resources that are open-source, community-owned, geoscience-focused, and high-quality. Our educational goals include helping geoscientists make sense of huge volumes of numerical scientific data using tools that facilitate open, reproducible science, and building an inclusive community of practice. Pythia works as an organization of several repositories, so one should check out the https://github.com/ProjectPythia organization for comprehensive information but the main portal repository is https://github.com/ProjectPythia/projectpythia.github.io

Affiliated Project Selection Committee Members

Florian Roscheck

Vice President

LinkedIn | GitHub

Paul Anzel

Secretary

Filipe Fernandes

Rocco Meli

Vyas Ramasubramani

Andre Leon Sampaio Gradvohl

Richard Gowers

Christopher Siefert

Mert Bozkir

Florian Rathgeber

Sathvik Bhagavan

Venkateshprasad Bhat

Steven Kell

Stefan Krastanov

Saransh Chopra

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More

Read More