10/16/2017

Please do not modify this document further, unless present at the meeting.

Present: Stéfan, Jarrod, Chris Hench, Elena Glassman, Chris Holdgraf, Dmitriy Morozov, Jonathan Dugan, Nathaniel Smith (phone), Carl B, Chris Kennedy

  • [Stéfan] Introduction & overview of last years
  • [Jarrod] DS4DS v2
    • Related to NumPy grant
    • Reading group (patterns, architecture, view from Berkeley paper?, grammar of graphics)
    • Paper after
    • Sub-ideas to clarify:
      • API
      • Exchange formats
      • Level/scope: just arrays? API patterns for common machine learning patterns? …
  • [Nathaniel] Numpy grant: make NumPy better
    • Job ad up in next week or two
    • Nathaniel giving talk on Thursday
    • Contributions include topics such as:
      • Refactor internal architecture to extend data representation
      • First class support for categorical data, missing data, sparse arrays, customizable data types
      • Lots of smaller issues to triage and prioritize
  • [Chris Holdgraf & Nelle] Docathon
    • Week-long event to increase documentation
    • https://docathon.github.io/docathon/
    • Tooling work:
      • Numpydoc
      • Improvements to the website
      • Analytics pipeline for the github workflow
      • Visualizations for the analytics
  • Data science Kaggle/other competition for team building
    • Intense 2-day hackathon, vs longer term low burner project
    • For team building, probably shorter term
    • Try to rope in folks who aren’t here every day
    • Jonathan++
    • Nelle mentions:
      • in their previous team they used to do this as a retreat event
      • It may be possible to publish a paper on this, if we pick the challenge carefully
  • [Dmitriy] Vis Extravaganza
    • First event: tomorrow (Tue, 10/17/2017); then possibly monthly
    • Discussion on how to make paper figures better / more effective
    • Expand across campus?
    • Jarrod suggests contacting Deb Nolan for vis lecture
    • Organizational help welcome
  • [Dmitriy] Dionysus
    • It computes persistent homologies
  • [Stéfan] Machine Shop
  • [Stéfan] SkyPortal
  • [Jarrod] GraphXD
  • In general, connect with Marsha to communicate these types of events
  • [Nelle] Peer code review of small snippets
    • Figure out technical problem, use better APIs, how to use library better, etc.
    • Impromptu?
    • Friday afternoon; drinks/food/happy code/happy hacks
    • Talk to Nelle about turning happy hours into part code-review
    • Connect with Stuart, Hacker Within?
  • [Maryam] Education – beyond Software Carpentry
    • Generate educational material, focused lessons, Software Carpentry++ bridging SWC to more advanced lectures
    • How do I package for pip, how do I share my data at UC Berkeley, etc.
    • Organize workshop to teach this type of thing at BIDS
    • J: Point blog post to technical work that has been done
    • Can we utilize existing materials from folks at UW, SciPy lectures
    • Take a look at what Data Carpentry has done already
  • [Fernando] Software exercises collection
    • Stand-alone, not building on other work
    • Easy to select & re-use in lectures
  • [Chris Hench]: Introductory data science notebooks
    • https://github.com/ds-modules
    • Course specific, mostly one per course, sometimes more, usually mid-semester for 1-4 class periods [DSEP + D-Lab + BIDS]
    • Social sciences, mainly
    • Blog post?
    • Watch for GitHub pages site
  • [Chris Holdgraf] GitHub developer dashboard
    • The idea here is to use the GitHub API to create some kind of dashboard that developers can use to measure community health
    • There’s interest from some folks at GitHub in something like this, so we may be able to get a cool dataset from them.
    • [Stéfan]: Also take a look at OSS DevKit (currently under development at BIDS): https://paper.dropbox.com/doc/Open-Source-DevKit-TB33O5Y8ZeTnTkHP1SbYN?_tk=share_copylink – we already have a functional command line client that can be used to pull/push PR changes from/to GitHub.
  • [Chris Holdgraf] Turtle JavaScript repo
    • Turtle is a great way for beginning programmers to learn about programming. Heavily-utilized in middle- and high-schools
    • Thomas Kluyver has a small javascript implementation of turtle: https://github.com/takluyver/mobilechelonian
    • It needs some marginal improvements to be a “full-fledged” turtle program. I think this would be a nice small-ish project to make improvements to.

Notes:

  • S + J: write up above list and publish to https://bids.github.io/swg
  • Update this website after each planning meeting
  • Q: Is datascience.tables usable for real work?
  • Does the BIDS blog have an Atom/RSS feed?
  • Can we do a technical BIDS blog sub-feed?
  • RopenSci: can we distill lessons learned?