NumFOCUS intends to distribute $60,000 in small developments grants to our sponsored and affiliated projects in 2018. Thanks to a successful fundraising year in 2017, NumFOCUS is able to provide funding to help the projects improve usability, grow their communities, and speed up the time to major releases.
Our first recipients are…
Paid Developer Time for Major Code Improvements
Shogun: Fully integrate new parameter framework, unify API/interfaces, and release v 7.0 — $1,500
(NumFOCUS Sponsored Project)
“The goal of this project is to give Shogun’s new parameter framework, a central mechanism to store algorithm parameters, the final push towards full integration, and replacement of the old version. Its completion will mark the end-point of a major 2 year effort, replacing some of Shogun’s over-a-decade-old code. The new framework will be a cornerstone of Shogun’s recent modernization efforts in terms of API unification, maintenance and compile-complexity, and a modular/plugin architecture.
An initial design prototype of the new parameter framework was built in GSoC (Google Summer of Code) 2016, and a more polished version saw initial integration into Shogun’s code-base in early 2018. The grant will support a core developer dev-sprint to integrate modern C++ data-structures, such as std::vector, into the framework; finish existing work on a generic clone implementation; port old model-selection code to use the new framework; supporting a generic get/put for parameters and porting all examples; reducing the size of the SWIG interfaces via only exposing a simple base-API; and building a first working prototype of a plugin-based build architecture.
These goals will be achieved by 4 core developers doing a code-sprint on the issue full-time for an extended weekend and 2 developers staying for the subsequent week, in order to prepare the mentioned release. The output of the project will be merged into the main branch of Shogun, and the final goal is to release of Shogun 7.0.0 with the mentioned features.
The result of the dev-sprint is improved usability for both users and developers/scientists.”
SunPy: Improving the Usability of Our Data Downloader — $3,000
(NumFOCUS Sponsored Project)
“A major feature of SunPy is its Python interface to a wide variety of online sources of solar physics data. These services allow users to search for and download data. Over the last few years a new unified framework has been developed to a heterogeneous collection of solar data providers. The purpose of this new framework is to provide users a standardised method for searching and downloading solar physics data irrespective of their source. The first iteration of this functionality, named “Fido,” was released in the last SunPy release 0.8.
Fido provides a unified search interface for users. It also provides a new and straightforward interface for developers to add new data sources which has already led to an increase in the number of sources and types of solar physics data for which users can search. Despite this, the user experience of using Fido is far from perfect, there are multiple places where the search and download interface behaves differently for the different sources of data. This means that the user has to be aware of where Fido is searching for different data and know how to handle the subtle differences in the results. Rectifying this is one of the core goals for the SunPy 1.0 release. To achieve this we plan to fix or implement the following things:
– Support concatenation of result objects.
– Replace the old threaded downloader with one using asyncio.
– Make sure all clients correctly implement `response_block_properties` and can use it in filename substitution.
– Ensure that the behaviour of the fetch methods is identical over all clients, especially with respect to file overwriting.
The next major improvement to SunPy’s data downloading features would be to enable the discovery of available types of data. Currently, the search interface requires the users to know the names and types of data they wish to search for, before doing a search. For instance, there is no way to programatically get a list of all the instruments with data available, which can change if services add new data. The user is required to first read the documentation of the services SunPy provides interfaces for to discover these values. This missing functionality is now much more noticeable as there are more sources and more variety in the types of data. Enabling the user to discover the available terms for which they can search would help to advertise the less well known data sources. Implementing this feature requires implementing support for discovery of the available values for each attribute for each client. The second phase is then designing and implementing the best API for accessing the values on the attribute classes.
Finally, given these improvements to the Fido functionality, we would like to add documentation, specifically focused on developers who wish to add new sources of data to SunPy. This is currently missing completely, and by providing it we hope to encourage people to contribute code which increases Fido’s versatility.
These objectives are high priority for SunPy’s 1.0 release, which will include major upgrades to all parts of the package. Being able to fund the development of the Fido features will make a big difference to the release date for SunPy 1.0. The search and download code in SunPy is one of the most complex parts of the code base, so having the resources to fund one or two of the core developers to implement these features, rather than relying on volunteer time, will substantially accelerate this work.
We are hoping to have GSoC or SOCIS students work on the other major components of the 1.0 roadmap. With this support we hope to be able to release SunPy 1.0 early in 2019.
This project holds great benefit for both the SunPy project and its users. The data search and retrieval functionality is one of the flagship components of SunPy. It is also the first task that many users need to perform; this often makes it the first impression of the whole scientific Python ecosystem for many solar physicists switching from other languages. Improving the usability of this part of SunPy is therefore a very high priority.
We also believe that improving the usability and discoverability of the different sources of data searchable by SunPy will increase the scientific value of the data, as it will be easier to perform multiple instrument or wavelength studies. This would then become a unique selling point of SunPy over other tools used by solar physicists, and therefore drive adoption of Python within solar physics.
Being able to fund a focused development effort on this component will free up a lot of volunteer time to work on the other main goals for 1.0, including ending support for Python 2 and migrating to Astropy Time. Just relying on volunteer time it is likely that some of the goals will not be achieved or the schedule will slip.”
Spyder 4: Making the Scientific Python Development Environment even better — $3,000
(NumFOCUS Affiliated Project)
“While already a powerful tool and a flagship IDE in the Anaconda and WinPython distributions, the project is seeking funding to help complete the next major leap in its evolution.
Currently, the team is actively maintaining a stable release branch—Spyder 3—with continuous bug fixes and minor enhancements requested by users, while slowly working toward the release of a next-generation environment—Spyder 4—with an overhauled architecture and new functionality sought after by the community. The project is alive and well, with an average of around 250 user-submitted issues, $$\approx$$50 PRs merged, and over a dozen unique contributors per month, including many newcomers. Over 1,200 unique commits have been made to to the Spyder 4 development branch, implementing many major new features, and users are enthusiastic about seeing the changes as soon as possible due to the major positive impact to their work every day.
If funding is secured, the first public beta with existing functionality would be released Q2 2018, the second with the Language Server overhaul and plugin API would be ready Q3 2018, and the third and final with the new debugger and any remaining funded features during Q4 2018. Furthermore, speedy completion of this objective would enable current and future work on the promising Spyder plugin ecosystem—already offering add-ons for Jupyter notebooks, system shells, interactive rich-text reports, unit testing, and many more—that is currently on hold to focus limited resources on Spyder 4. Without funding, the final release would be unlikely to happen until 2020 or even later, putting the project’s future in jeopardy. External funding would allow the project to pay for the developer time needed to complete the remaining planned features and prepare it for public release on schedule.
Funding for the developer time necessary to complete this work will help support critical advancements to the only mature, full-featured dedicated scientific IDE for Python freely developed by and for the community, a neigh-irreplaceable tool that fills a vital role merging the worlds of laboratory research and data analysis, and sharable, production-ready code.”
Improvements to Documentation
Cantera: Modernize, Reorganize, and Update Our Documentation — $3,000
(NumFOCUS Sponsored Project)
“Cantera’s current documentation contains a thorough description of our API and a good set of examples for users to peruse. However, there are several issues with how this information is presented, which limit its usability. The landing page (http://cantera.org) is essentially a list of links to pages deeper in the documentation. There is no distinction between links to API documentation, links to examples, or links to other high-level material (physical modeling descriptions, etc.). This format makes it difficult for both new and experienced users to find required information. For new users looking for a few helpful hints to get started, the format is simply intimidating. Moreover, because information on the desired class or method may be buried behind two or three links, it is difficult for even experienced users to find the particular method or class they want. Finally, the overall theme is outdated and in need of a refresh.
In addition, the method of generating the document inhibits our ability to keep the documentation up-to-date. At present, Cantera’s user documentation (examples, physical model descriptions, compilation/installation instructions, etc.) and API documentation for Python and Matlab are generated by Sphinx, while Doxygen is used to generate the C++ API documentation. This system has several flaws, the most glaring of which is that changes to the user documentation must be committed to the main Cantera code repository. This couples the generation of the user documentation with the API documentation, and limits our ability to release updated versions of the documentation. For instance, we do not want to add new, unreleased, API changes to the documentation pages for the stable version. Thus, we have to backport all user documentation changes to the release maintenance branch, resulting in duplicated commits and builds.
Therefore, in this project, we propose to modernize and update Cantera’s documentation. We envision this as a two-part process, with both parts to be completed under this proposal. In the first part, we will restructure the documentation so that users (new and old alike) can find the information they’re looking for more quickly. The second part of the process will introduce changes to the broader website to improve its visual appeal and functionality, including a modern theme to the website, a new home/landing page for http://cantera.org with more general information about the project, and additional website pages (e.g., a blog page).
This proposal will benefit the community by providing easier access to the information that users need, covering a range of activities from installation to basic use to advanced applications. It will make it easier for new users to find tutorial material or examples, and it will make it easier for advanced users and developers to find detailed documentation. This will make Cantera easier for current users, and also help encourage adoption by new users, broadening Cantera’s reach. In addition, the proposed changes will make it easier for developers to maintain the documentation and provide timely updates or add new material. Finally, improving the “look-and-feel” of the website will improve the professional appearance of the project and make it more appealing to new users.”
Gensim: Modern user-friendly documentation — $3,000
(NumFOCUS Affiliated Project)
“We already have a large number of popular models, tutorials and notebooks in Gensim, therefore, we want to pay more attention to the documentation. Right now, we have some great Python models that users don’t know how (or when) to use – so they won’t use it! For this reason, we want to significantly rework our documentation structure, both in terms of layout (design) and content discoverability (UX, visitor flow, API documentation and structure).
Our final target is to make Gensim simpler to use for the users. The new documentation will allow the community to use Gensim more efficiently, with less time Googling or fumbling around. Thanks to the new docstring and tutorials, people will have fewer questions about typical usage, lowering our maintenance costs. The new website will make it easier to navigate the models and make it easier to find the information you need, both for new users and power users.”
Advancing Diversity & Inclusion in Scientific Computing
Orange Data Mining: Girls go Data Mining — $3,000
(NumFOCUS Affiliated Project)
“While girls-only workshops sound almost cliche at first, experience has shown that girls-only events attract a different pool of female participants. These are mostly non-tech types and they seem to be much more confident applying for a single gender events. Our project aims to nurture interest of girls in CS topics with a beginner-level visual programming data science workshop. Moreover, having local female journalists join the workshop, means better journalism in the future, not only by covering more scientific topics, but also by having educated journalists who can leverage digital tools effectively.”
The NumFOCUS toolkit is indispensable to a data scientist.