A Global Day for Open Source Contribution
In the words of one of the participants:
“Pandas is an amazing open source Python library for data manipulation and analysis which has become in the last few years the de-facto standard for data science and data engineering.”
Coordinating a Worldwide Event
Longtime NumFOCUS community member Marc Garcia played a key role in organizing the logistics of the day.
“Special thanks to Marc Garcia because when ever I got stuck, I wrote a mail to him & I got immediate response from him with solution.”
~Himanshu Awasthi of pandas Sprint Kanpur, India
Marc has written an informative blog post that provides a history of how he himself got into contributing to open source and how that shaped his thinking in preparing for the pandas Sprint. We definitely recommend you check it out!
“The goal wasn’t that much about the specific project or contributions, but about letting people get into the open source world in the way many of us love it.
Becoming part of it, and not just being a user of some software we don’t need to pay for.”
~Marc Garcia, pandas Sprint organizer
Marc and the organizing team worked to ensure that each participating city had a task, documentation explaining all the steps, and mentors to help new contributors.
Pandas Sprint Welcome Video
NumFOCUS also contributed some welcoming words for the introductory video that was screened in each city at the start of the sprint. Watch the short video, which opens with a message from original creator of pandas, Wes McKinney:
A Closer Look: Highlights from 7 Cities
“The tech writers wanted to write and improve the docs while learning new things in the process.
The data scientists were very fond of pandas and wanted to help the community grow.
The engineers were driven by curiosity and desire to contribute to OSS.
Everybody brought in their particular spark of passion.”
Photo c/o Irina Peshina
“It felt like being connected to a whole world of like-minded people, fueled by the same passions. The gitter channel ensured that we stayed connected to the community from the other chapters, getting help along the way.”
Palma de Mallorca, Spain:
“There were many people involved in the organization of the event, but one person in particular, Marc Garcia, deserves a standing ovation for making it possible.”
“We absolutely love events where people learn new skills and contribute back to the community and this was a very fine example of all this.”
“Since I had to run my stuff on Windows, some community folks had created special videos for the same for us to be more inclusive.”
“Pushing changes to our respective branch was not the end of story for most of us. Real fun started after that, when it was reviewed by the Pandas Core Maintainers. I for instance had several iterations to changes to make me better understand the requirements, also to be more helpful for the community of users. I had also run into some Git issues and the help received was a great learning experience.”
“I can proudly conclude that I now have no fear to contribute to an Open Source Git Repository and especially Pandas, because we have such a wonderful community of developers out there to help us learn, improve and contribute in an iterative way.”
“Pandas Ankara Meetup group has more students than professionals and this sprint group is mostly newbie CS students. For them, Python is too complex to interpret the documentation requirements but they adapted very quickly and enjoyed it.
This first sprint was the first important step to adapt them into open-source community, so to motivate them I believe I tried my best.
At the end of the day they all said thanks to PyData community for their help from the gitter channel during the day of the sprint.
Now we have bunch of GitHub familiar students looking for further challenges :D”
~Meltem Atay, PyData Ankara Organizer and a hobbyist of Deep Learning Türkiye
“I learn’t a lot from this sprint. I didn’t work on pandas before this sprint but now I continuously work on pandas docstring open PRs & reviewing others’ code on GitHub.
People defines their happiness in different ways but now I think: Happiness is when your PR merged.“
“As a programmer that leverages open-source software to make a living I have always wanted to make a contribution back to the open source community.
This past weekend I achieved this goal during the global pandas documentation sprint.”
Marc’s posts provides a walk-through example of the type of issues each chapter was tasked with improving. Read the full post here.
“Although I frequently undergo this process at work it was really cool to do this with people that I’ve never met before across the world.”
Perspective on the sprint from a pandas core maintainer:
“Many thanks to all organizers and contributors of the sprint. A lot of people learned about contributing to open source, and it made a significant impact on the quality of the pandas API documentation!”
~Joris Van den Bossche, pandas core developer
Joris wrote up a Jupyter notebook with a quick plot showing the GitHub activity as a result of the sprint. Read the full post here.
“The documentation sprint really made a significant impact on the quality of the API docs of pandas, as a considerable part of the API docs got updated and extended.
A nice side-effect of preparing the sprint was that it forced us to actually think about the state of our documentation: how we see a good docstring, what standards we want to use, etc. We now have a detailed docstring guideline (for a big part thanks to the sprint organizer Marc Garcia!), which can be further refined based on the experience of the sprint.
This is a very valuable outcome of the sprint, apart from the actual documentation contributions itself, as this guideline will be very useful in the future as well.
The sprint of course generated a lot of work as well, to get all the PRs reviewed. I think it was crucial here that we provided, next to the guidelines, good tooling to check the docstrings (validation script that gives feedback to the contributor, html viewer). I think this helped to get good quality PRs, and reduce certain aspects of the reviewing. And at the same time it is certainly an area where we could further improve, to get even more automatic checks of the consistency and style of a docstring.”
Over 30 cities around the world participated! Be sure to check out all the awesome photos via the #pandasSprint hashtag on Twitter.
Ready to make your own contribution to pandas?
The NumFOCUS toolkit is indispensable to a data scientist.