Introduction

In early June, I've been instructing a Software Carpentry workshop with Kwasi Kwakwa, Chelsea Chisholm in CERN. This has been an amazing few days on all aspects: my co-instructors, the hosts and the audience. This was also a very enriching workshop as an instructor.

Software Carpentry at CERN/LHCb

For many scientists, the CERN has some mystical aura and it was a very good surprise for me to instruct there (and it is also at a reasonable distance). As the Large Hadron Rap explains, “LHCb sees where the antimatter's gone” and it's pretty cool :). The bad thing about CERN/Geneva/Switzerland is that it is tricky to find a hotel at a reasonable price… but the CERN is not to far from the border and one can also find French hotels that can be reached by a 25 minute walk.

The workshop was actually a first step of a 4-day workshop for the learners. The first two days were the “traditional” Software Carpentry curriculum while the last two days were organized by the hosts and focused on LHCb-specific skills.

We have been very well welcomed by the hosts. We had guided tour of the LHCb - point 8 and got invited to a barbecue on the last evening (I was very pleased to be able to stay for this event). In CERN, I could also reach “the place where the web was born” (see picture), which is amazing.

2 Rooms and 3 Instructors

The workshop was organized for 40 learners that we split into 2 rooms: we ask people to self-evaluate themselves to create a more advanced group. Software is actually quite pervasive in LHCb and thus some learners (one quarter) had a quite advanced level. I guess the problem exists in most Software Carpentry workshops. We got both “slightly too slow” and ”a bit fast” on the sticky-notes, so it seems we did well at adapting.

For these two rooms, we were only 3 instructors. Luckily we had enough helpers. We decided to switch between rooms both to balance the teaching load and to allow every learner to be taught by every instructor. Once again, it was a real pleasure to meet, share and work with the other instructors.

For a few reasons, I would not recommend having (too often) workshops with 2 rooms and 3 instructors. 1) It is clearly more tiring for the instructors (and probably less good for the learners): anyone who teaches probably understands that teaching 2/3 of the day is exhausting. 2) As an instructor, you loose a lot of context (from the room you're not in) and this complicates the switching between rooms. 3) The cross-instructor learning is lowered: you spend less time watching your co-instructors teach, so you learn less from them and can give less feedback to them.

This last point is very important in my opinion and I very like workshops for that: they are a “sandboxed” and compact teaching experience that allow for quick feedback and continuous improvement.

Technical Organization and Remarks

Both rooms were equipped with whiteboards and we used them at some point in the lesson (yes, git, I'm looking at you). If we did not have them, I feel we would have missed them. It can be good to check beforehand with the organizers that you'll get boards.

When teaching, you will share your screen and it is very convenient to have a tablet to browse the lessons on the side (by default, I used my phone). It really helps staying on track and not forgetting important things… even tough I still did it… fortunately the audience asked the proper questions.

The rooms were quite packed, with a decent layout (tables and power for everyone). There is room for improvement on the room side. One room had no windows which turned out to be a good thing in the end: these days were very hot and sunny and the other room got warm in the afternoons. The dark room was very elongated: some learners probably got neck pain from looking at the screen (on their side).

Server-Side Jupyter (ipython) Notebook and Socks Tunnel

The learners were meant to use some remote servers set up in CERN so that no heavy installation was necessary on their laptops. The hosts had installed a JupyterHub server to allow learners to directly use Jupyther notebook with zero installation. I was actually using the ipython notebook for the demos and this was slightly confusing (the logo differs)… I won't do it again, I promise.

At first sight the JupyterHub seemed to work, but, due to CERN restrictions, it was not possible for learners to write files from the notebooks (including the notebook itself). In the advanced room (I was teaching it at the time), we fell back to local installations (most had already anaconda installed). Thanks to the helpers for making the remaining installations go smoothly.

In the other room, they fell back to the plain python interpreter at first. Later in the workshop, we adopted a solution where each learner remotely logged on the server and started a notebook. To be able to access it, a port forwarding was necessary and we used a SOCKS proxy: adding an option -D 9999 to the ssh command, then, in the browser, setting up the SOCKS proxy configuration as localhost port 9999.

Feedback on Core Lessons

The lessons went pretty well and the feedback system based on sticky notes continues to amaze me. This allowed us to adapt the things that we teach across sessions. One example is that we covered python scripting after the feedback that the notebooks are not so easy to reproduce/rerun/automate.

Overall, we got both “too slow” and “too fast” feedback, with probably more “too slow” than usual.

Semi-Improvised Intermediate Material

We were covering only bash/python/git (no SQL), so the advanced group needed quite a few intermediate topics which are somewhat scarce in the current set of lessons. I was teaching the last two slots where we scheduled some “summarizing exercises” and some “advanced topics”.

I recycled my old Higgs Boson machine learning example that I had designed for a workshop in Pisa last year. The old (clean) version used np.genfromtxt to load the data and then convert it. I decided this time to rebuild it in an interactive incremental manner. The new version uses np.loadtxt and we built the solution with the learners by iteratively trying things, failing, reading the doc and looping. This was the occasion to show this process and to insist on what we had seen before (assert-testing, numpy filters, extractions and aggregations, ...). I still gave pointers to the old version that was going further (especially with plotting).

In a second time and following some post-it feedback (see the picture), I converted the notebook to a python script. This was the occasion to show a few things: converting a notebook to a python file (this can be done with ipython nbconvert or by exporting from the ipython notebook in the browser), throwing away all the unnecessary code from the notebook, handling arguments from sys.argv, saving matplotlib graphs to files. The final script can be found on the workshop website.

Here are some other topics I remember covering, some of these in reaction to feedback, questions, or open-challenges (“what would you like to learn to do now?“):

  • shell variables, conditional statements and arithmetic operations using $(( ... )), in the context of the ... | head | tail example
  • shell expansion in more detail (echo *.txt is expanded by the shell, not the command, and thus with find you might need to escape it, etc).
  • chmod and the “shebang” for executable files
  • git svngit-svn as the LHCb has only started its conversion to git
  • regular expressions in sed and awk (with a word about perl)
  • list comprehension
  • dictionaries and dictionary comprehension

Concluding Item Lists

Notes for later:

  • try to follow the lessons better :)
  • reserve more time for exercises
  • install jupyter (just to avoid confusing students with the ipython logo)
  • take pictures for the debriefing
  • draw the same kind of continuous diagram for git lesson (and take a picture this time...)

Ideas:

  • collect more bite-size intermediate lessons, for such audience
  • make lessons more tablet-friendly (easier navigation, big buttons)
  • export lesson as slides (especially 1 figure per slide and 1 challenge per slide)
  • prepare live-MCQs, as I see them work pretty well in other contexts
  • put tracking/stats on the workshop website to see if people come back to it afterwards

Links:

Any feedback or remarks? Contact me at click-me ;-p @nospam.com.