Author: Reshama Shaikh
High Level Summary
Number of participants who:
- Registered: 76
- Attended: 38
- Submitted >= 1 pull request: 24
- Countries represented: 10
Background
The PyMC open source working sessions were organized by Data Umbrella to increase the participation of underrepresented persons in open source, python and data science.
This report focuses on the summary, impact and lessons learned of the Data Umbrella PyMC Open Source Working Sessions.
Event Sessions
A series of 3 separate working sessions were organized, plus pre and post event office hours. Participants were paired with another person with whom they could work during the working sessions.
The office hours provided a casual, unstructured space for participants to introduce themselves and ask any questions.
The 3 working sessions were scheduled at different days of the week and times in order to provide options for folks in the community to attend who had varying schedules.
The intention was that some participants would be able to attend multiple sessions to build experience in contributing. Some participants attended more than one session, and two participants attended all 3 sessions and both office hours.
Pre-Series Office Hours
Photo not available.
Session 1
Session 2
Session 3
Post-Series Office Hours
Event Sponsors
This event was supported by:
This is a 3-minute video by Mariatta Wijaya of Google with inspirational tips on contributing to open source.
Schedule of Sessions
- 02-Jul-2022: Pre-series Office Hours (13-14:00 UTC) (1 hr)
- 09-Jul-2022: Session #1 (13-16:00 UTC) (3 hrs)
- 22-Jul-2022: Session #2 (16-19:00 UTC) (3 hrs)
- 4/5-Aug-2022: Session #3 (23-2:00 UTC) (3 hrs)
- 18-Aug-2022: Post-series Office Hours (23-24:00 UTC) (1hr)
We varied the schedule of working sessions to accommodate participants from different regions and time zones.
Number of Attendees
Session | Data Umbrella Organizers | PyMC Mentors | Community Contributors | Note |
---|---|---|---|---|
Pre-series Office Hours | 3 | 2 | 24 | |
Session #1 | 3 | 4 | 20 | |
Session #2 | 3 | 4 | 12 | |
Session #3 | 1 | 4 | 6 | Asia-Pacific (a) |
Post-series Office Hours | 1 | 3 | 4 | Asia-Pacific(a) |
(a) Session 3 and post-series office hours were for Asia-Pacific time zone.
Event Participants
We used a Sphinx website whose source code was publicly available. We provided instructions on how participants could add their information to the website. Participants had the option to add their name, photo and other information to the event website as contributors. For some participants, adding their information was a milestone because they were working Git, GitHub, sphinx and submitting a pull request for the first time.
Contributions Statistics
The contributions during the working sessions were tracked in this PyMC OS-WS spreadsheet. Contributions included both submitting a pull request and opening an issue where observed.
We worked on a few different repositories for the PyMC project:
- video-timestamps: this is a beginner-friendly list of issues where contributors watch a video from the PyMCon 2022 conference and add timestamps
- pymc-data-umbrella: this is the event website. Contributors could submit PRs to fix typos or clarify the contributing guide, as well as add their information to the list of participants
- pymc-dev/pymc: this is the main code repository for PyMC
- pymc-dev/pymc-examples: this is the repo that holds notebook examples for PyMC
As of the date of this report (28-Aug-2022), these are the PR stats:
- Open: 2
- Merged: 56
- Issues opened: 6
Timestamps
Timestamps were added for 16 videos.
Event website
A number of PRs were submitted to update contributor information.
Updating Jupyter Notebooks
This was a more intermediate issue for new contributors, which was updating notebooks with consistent information for sphinx rendering.
PyMC documentation
These contributions were in the main code repository.
Demographics
Of the 74 people who registered, 38 attended. Of the 38 who attended, 24 submitted a pull request. This funnel graph shows the breakdown, by gender.
A total of 38 contributors attended at least one event of the working sessions, including office hours. 14 of 38 (37%) identified as she/her. 24 of 38 (63%) identified as he/him.
Contributors joined from 10 different countries. Country information was provided based on where participants were joining from.
- United States of America: 13
- India: 6
- Ghana: 4
- Kenya: 4
- Germany: 3
- United Kingdom: 2
- Canada: 2
- Brazil: 2
- Colombia: 1
- Ireland: 1
Returning Contributors
There were 3 “returning” contributors. These contributors had participated in a previous scikit-learn sprint.
Spoken Languages
The event was run in English. Participants were asked on their registration forms to indicate if they needed a translator. No translators were requested.
We had a channel for #espanol_chat
which was utilized at a session when there was a Spanish-speaking mentor and participants from Latin America.
This barplot shows the primary spoken languages by the participants.
Impact Report for Data Umbrella PyMC Open Source Working Sessions
Non-measurable Impact
Aside from the number of PRs that were merged and issues that were opened, there is non-quantifiable impact of the open source working sessions. Some examples include:
- learning to set up virtual environment
- using Git (fork, clone, branch, fetching another’s PR)
- introduction to tests such as: flake8 (linting, formatting), pytest, “continuous integration”
- learning about sphinx and documentation
- learning about
numpydoc
validation - navigating through the codebase structure of pymc
- digging into functions, learning about errors
- interacting with contributors on GitHub
- learning, in general
- networking, meeting people from around the world
- building confidence (making a dent in “imposter syndrome”)
- having fun
Finding out About the Working Sessions
For those who attended the working sessions, this is how they learned of the event. The main avenues were by invitation from Data Umbrella, Meetup, Twitter, LinkedIn and their network (“word of mouth”).
Sessions Feedback
Feedback has been shared a number of ways:
- Event survey
- Social media (Twitter, LinkedIn)
- Casually, in conversation during the office hours and working sessions
Survey
We received 5 responses to the survey. The primary reason the responses rate was so low is that these events were spread over a 7-week period and different people attended different events.
Overall, the feedback on the surveys was positive.
In response to the question “What are your favorite parts about the sessions?”
- Interacting with Mr. Christian and getting to know more about the community and workings.
- Working with other people - a lot of time spent alone when learning usually so it’s a nice change and good to be exposed to other people’s ideas
- Meeting core PyMC team and other contributors, networking, learning to contribute to open source project
Suggestions for Improvement
In response to the question “What could have worked better at the sessions?”
- I had (and still have) difficulty finding certain pages and links - between pymc contributing section and dataumbrella/pymc website I get confused, since the websites look similar but have different URLs
- Call out need to fork both pymc and pymc-examples (or whichever one you plan to contribute to)
Pair Programming
Because there were 3 separate working sessions plus the two office hour sessions, it required some flexibility on who planned to attend the sessions. We provided a spreadsheet where participants could add their name to pair them up with a programming partner.
Challenges
Challenge 1: Emails going to spam
We communicated with registrants via email and Discord. For a number of people, the emails went to spam and they missed it. We do have a reminder on the registration form to keep an eye out on their spam folder, but emails were still missed.
Challenge 2: Preparing by reading
The event had a comprehensive website and the events were posted on Meetup with instructions as well as in multiple places (event website, Discord, newsletters, emails) on the process (join Discord, read through the event website, submit a registration form). Despite numerous reminders, a number of people did not join Discord, some joined Discord at the start of the event, which might indicate they missed reminders, some participants did not submit a registration form, some participants did not review the website, etc.
It is important that participants submit a registration form for these reasons:
- They have read and agreed to the code of conduct.
- They understand how the event will go and how to prepare.
- Many participants have anonymous Discord profiles and this information is needed to track who is joining the event and can be added to the private channel.
- We need to connect participants to their GitHub pull requests to track contributions.
- We need participants email addresses to communicate with them about the event.
Challenge 3: Discord
Some participants had technical issues with Discord. We have a 10-minute video on how to navigate Discord, though it is not apparent that all participants watched the video.
Perk: Mentorship
Working Sessions 2 & 3 had fewer participants which allowed for each pair programming group to have a mentor who could spend almost the full session with them. This was extremely beneficial and provided an opportunity to get to know the PyMC maintainers and ask many questions 2-on-1.
Perk: Organizers Contributing
The Data Umbrella team members are interested in contributing to open source too. Often at events which are busy, the organizer time is more dedicated to administrative tasks. Since the groups for Sessions 2 and 3 were smaller, it provided some time for the organizers to contribute as well. This is important as one of the challenges in community manager work is having time to do coding work.
What’s Next: Maintaining the Momentum
We have already seen a few event participants continue to contribute after the event.
We hope to maintain the momentum by holding casual monthly “study groups” to continue contributing to PyMC.
Sessions: Social Media Shares
Carlo of Brazil
Pablo of Brazil
Igor of USA
Dustin of USA
Prince of Ghana
Rowan of Tennessee, USA
Made my first open source contributions today with @CarolBasknRobns! Watch out world 💪🤓 Thanks for the great event #DataUmbrellaPyMCSprint @DataUmbrella @pymc_devs pic.twitter.com/BKRPZcLETC
— rowan schaefer (@rowan_________) August 5, 2022
Benjamin
Really enjoyed the first working session of the #DataUmbrellaPyMCSprint! Thank you @DataUmbrella for organizing the event.
— Benjamin Datko (@BenDatko) July 9, 2022
Chris Fonnessbeck, PyMC Team Member
The @DataUmbrella PyMC sprint on Friday was fantastic. It's a great way to get involved with the project and with the open source data science community in general. https://t.co/pj3s8PNUas
— Chris Fonnesbeck (@fonnesbeck) July 24, 2022
Zoe
I’m proud to join the @pymc_devs contributors team, thanks to the leadership or @reshamas and the @DataUmbrella community. https://t.co/ns017TCvsC#DataScience #OpenSource #Statistics
— Zoe Braiterman (@zbraiterman) July 10, 2022
Social Media Promotion
We created a social media kit for the Data Umbrella PyMC Open Source working sessions to provide content for our community partners to share.
Twitter (English)
🧵
— Data Umbrella (@DataUmbrella) June 20, 2022
📣Join us: *online* working sessions to contribute to @pymc_devs #oss
👉🏽with a focus on underrepresented persons in #DataScience
🗓️ Jul/Aug 2022: office hrs + 3 sessions
We thank our sponsors @Google @cziscience @pymc_labs
Submit a registration form:https://t.co/WFLPuy6rts pic.twitter.com/UyptFHrPav
LinkedIn (English)
Acknowledgments
We thank the Data Umbrella & PyMC organizers who created the website, created event documents, conducted outreach, marketing and so much more!
- Reshama Shaikh
- Beryl Kanali
- Sandra Meneses
- Sandy Weng
- Cristina Mulas Lopez
- Christian Luhmann
- Oriol Abril Pla
- Thomas Wiecki
We thank the PyMC team who mentored at the sessions and those who were online during the weekend afterwards to promptly review the submitted pull requests, particularly:
- Christian Luhmann
- Oriol Abril Pla
- Ravin Kumar
- Dan Phan
- Chris Fonnesbeck
- Alex Andorra
- Michael Osthege
- Fernando Irarrázaval
References
- PyMC sprints organized by Data Umbrella
- Interview with Sandra Meneses: Contributing to PyMC
- Reflections on the Data Umbrella PyMC February 2022 Sprint
- Data Umbrella scikit-learn Sprint Reports
Addendum
- [no addendums or updates at the time of publication]