NumFOCUS is pleased to announce PyTables as our newest fiscally sponsored project. PyTables is a package for managing hierarchical datasets designed to efficiently cope with extremely large amounts of data.
PyTables has been in the Scientific Python ecosystem for well over a decade and has been the front end for data storage for many other projects over the years. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool to interactively browse, process and search very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data takes up much less space (specially if on-flight compression is used) than other solutions such as relational or object oriented databases. PyTables adoption has grown significantly in recent years as a result of being adopted to implement panda’s HDF storage capabilities.
PyTables can be applied in any scenario where one needs to deal with large datasets, including:
- Industrial applications
- Data acquisition in real time
- Quality control
- Fast data processing
- Scientific applications
- Meteorology, oceanography
- Numerical simulations
- Medicine (biological sensors, general data gathering & processing)
- Information systems
- System log monitoring & consolidation
- Tracing of routing data
- Alert systems in security
The PyTables development team members—Anthony Scopatz, Andrea Bedini, Francesc Alted, and Antonio Valentino—are thought leaders in the data science community and have pushed forward new and successful ideas about data parallelism, querying and indexing data, and compression.