The Data Analysis Tools Series2018-12-05T08:32:38+00:00https://BIDS.github.io/datsThe Data Analysis Tools SeriesArchival Data Repositories2018-11-19T00:00:00+00:00https://BIDS.github.io/dats/posts/data-repos<h1 id="welcome">Welcome!</h1>
<h1 id="speakers">Speakers</h1>
<p>Joshua Quan - Data Librarian @ UC Berkeley Library, D-Lab</p>
<h1 id="content">Content</h1>
<p>This DATS session will introduce archival data repositories researchers might be interested in using to discover datasets or depositing their own data and code
for long-term archiving for others to discover. We will cover Dataverse, Dash/Dryad, and Zenodo.</p>
<h3 id="objectives">Objectives:</h3>
<ul>
<li>Why sharing datasets is easier with a Repository designed for archiving and discovery</li>
<li>Learn a little about: Dataverse, Zenodo, Dash/Dryad, OSF,</li>
<li>Searching for Data in Repositories</li>
<li>APIs + Tools to work with repositories</li>
</ul>
<h2 id="data-repository-defined">Data Repository Defined</h2>
<p>From <a href="https://www.re3data.org/">Registry of Research Data Repositories</a>:
“subtype of a sustainable information infrastructure which provides <strong>long-term storage</strong> and <strong>access</strong> to research data that is the basis for a scholarly publication. Research data means information objects generated by scholarly projects for example through experiments, measurements, surveys or interviews.”</p>
<p>…So it’s a place to put your data and analysis scripts that will be accessible beyond the life of a research project, grant, or individual career.</p>
<h2 id="a-minimum-rationale-for-depositingsharing">A minimum rationale for depositing/sharing…</h2>
<ol>
<li>
<p>Sharing your data gives you credit for your work that everyone can see</p>
</li>
<li>
<p>Your hard work will persist and be discoverable</p>
</li>
</ol>
<p>… fulfills the most basic components of <a href="https://www.go-fair.org/fair-principles/">F.A.I.R principles</a> for scientific data</p>
<h2 id="things-to-consider-when-choosing-a-repository">Things to Consider when choosing a Repository</h2>
<h4 id="reputation">Reputation</h4>
<ul>
<li>Is the repository endorsed by a funding agency, scholarly journal, professional society, library, etc?</li>
<li>Is it listed in the <a href="https://www.re3data.org/">Registry of Research Data Repositories</a>?</li>
</ul>
<h4 id="sustainability">Sustainability</h4>
<ul>
<li>Is there evidence that the repository will be around in Five years? Ten years?</li>
</ul>
<p><img src="../images/archives/digital_resource_lifespan.png" alt="" /></p>
<ul>
<li>Is the owner/manager of the content reliable?</li>
</ul>
<p><img src="https://github.com/wrathofquan/dats/blob/master/docs/images/archives/git.gif" alt="" /></p>
<h4 id="visibility">Visibility</h4>
<ul>
<li>
<p>One of the primary reasons to deposit your data in a repository is to obtain a unique identifier that others can use to cite your data. This service will increase the visibility of your data within the scholarly literature and allows researchers to find it later on.</p>
</li>
<li>
<p>Ensure your data repository offers a DOI (digital object identifier), handle, or another unique identifer.</p>
</li>
</ul>
<h4 id="usability">Usability</h4>
<ul>
<li>
<p>The usability of a data repository is also important in ensuring that others will be able to access your data. If your peers are unable to find and download your data it will limit the effectiveness of sharing your data.</p>
</li>
<li>
<p>A usable data repository should allow for users to easily upload, download, and cite data sets.</p>
</li>
</ul>
<h4 id="features">Features</h4>
<ul>
<li>
<p>Some data repositories have really great features like integrations with Open Science Framework, GitHub, or other commercial storage solutions. While these feature may not be the keystones to providing long-term access to your data, they can help you share your data more frequently and effectively</p>
</li>
<li>
<p><a href="https://docs.google.com/spreadsheets/d/1KptHzDHIdB3s1v5m1mMwphcwXhOVWdkRYdjEWW1dqrE/edit#gid=355072175">Comparative Overview of Features</a></p>
</li>
<li>
<p>You’ll want to review the upload and storage limits. Some repositories offer limited free storage before a fee is charged. Be sure to look over each data repository’s features and compare them with comparable services.</p>
</li>
</ul>
<h4 id="formats">Formats</h4>
<ul>
<li>
<p>Be sure to take a look at the repositories documentation to ensure they can store the data you’ve generated</p>
</li>
<li>
<p>Does the repository provide a way to preview data/scripts? i.e., rendering .ipynb in Github</p>
</li>
</ul>
<h4 id="rights">Rights</h4>
<ul>
<li>
<p>Take time out to read the terms of use and to understand what permissions you’re giving the data repository.</p>
</li>
<li>
<p>For instance, does your data repository use common licensing agreements (<a href="https://creativecommons.org/">Creative Commons</a>) that will help others understand what they can and cannot do with your data?</p>
</li>
</ul>
<h2 id="general-vs-subject-specific-repositories">General vs. Subject Specific Repositories</h2>
<ul>
<li>
<p>A “general” data repository is subject independent and will have data from many fields. General data repositories are often well-known solutions with large user communities.</p>
</li>
<li>
<p>General repositories are great places to store all your data because they tend to have robust features (like simple GitHub integration), strong institutional backing, and are indexed by search engines.</p>
</li>
<li>
<p>The downside of general repositories is that because there is a lot of everything, users might have more difficulty finding your work.</p>
</li>
</ul>
<h3 id="general-repositories">General repositories</h3>
<ul>
<li><a href="https://dataverse.harvard.edu/">Harvard Dataverse</a>: Harvard’s Dataverse is both a <a href="http://dataverse.org/">platform</a> for institutions and a data repository. Backed and developed by Harvard’s IQSS, Libraries, and Information Technology, Dataverse has 22 installations with over 48,000 datasets, and 2 million downloads.</li>
</ul>
<p><img src="https://pbs.twimg.com/media/DqGmSvLWsAAzAxn.jpg" alt="" /></p>
<p>-<a href="https://github.com/IQSS/dataverse/issues/4714">some cool ideas floating around</a></p>
<ul>
<li>
<p><a href="https://dash.berkeley.edu/stash">UC Dash</a> is an open-source, self-service toolkit for managing, openly publishing, and effectively describing data for access and reuse. Dash features geolocation metadata, ORCID, DOI, and FundRef identifiers, and generates a citation for all of your datasets. Additionally, Dash allows you to set a timed-release of data while undergoing peer-review.</p>
</li>
<li><a href="https://zenodo.org/">Zenodo</a>: Funded by <a href="http://home.cern/">CERN</a>, <a href="https://www.openaire.eu/">OpenAIRE</a>, and <a href="https://ec.europa.eu/programmes/horizon2020/">Horizon 2020</a>
<ul>
<li>Zenodo accepts <a href="https://zenodo.org/faq">50GB per dataset</a> and <a href="https://guides.github.com/activities/citable-code/">integrates nicely with GitHub</a>. While Zenodo doens’t seem to detail its download numbers like other services, it is partnered with CERN, which stores more than 100PB (petabytes) of data.</li>
<li>Starting to archive some of the lessons/modules created in the <a href="https://zenodo.org/communities/berkeley-data-sciences/">Division of Data Sciences</a></li>
</ul>
</li>
<li><a href="https://osf.io/">Open Science Framework</a> integrates with major storage workflows like Github, Google Drive, Box, etc.</li>
</ul>
<p><img src="https://cdn.cos.io/media/images/new-lifecycle.original.png" alt="" /></p>
<h3 id="subject-repositories">Subject repositories</h3>
<ul>
<li>
<p>Many subject-specific data repositories exist today. Unlike a general data repository, discipline-based repositories can be very specific and well-known within a particular field. This can be both a good thing and a bad thing.</p>
</li>
<li>
<p>Pro: If your field has a specific repository you’re data will likely be seen by the right people - increasing its chance for reuse and further influence</p>
</li>
<li>
<p>Con: Researchers outside of that discipline might not know where to look for your data</p>
</li>
<li>
<p><a href="http://www.re3data.org/">Re3data.org</a>: The Registry of Resarch Data Repositories is a service provided by DataCite (a global non-profit that provides DOIs - Digial Object Identifiers). With over 1,500 data repositories listed, re3data.org is likely to have a repository in your discipline.</p>
</li>
<li>
<p><a href="http://opendoar.org/">OpenDOAR</a>: OpenDOAR (Directory of Open Access Repositories) is an curated and authorative list of academic open access repositories. Not only do staff of OpenDOAR visit each repository listed but they also review each repository for quality (a pretty big task considering they have 2,600 listings). Included in OpenDOAR are datasets, articles, books, and software.</p>
</li>
<li>
<p>Simmons College hosts the <a href="http://oad.simmons.edu/oadwiki/Data_repositories">Open Access Directory’s list of Data Repositories</a>. The Open Access Directory is maintained by the Open Access community and an editorial board. It includes repositories ranging from archaeology to physics.</p>
</li>
</ul>
<h2 id="apis--wrappers">APIs + Wrappers</h2>
<p><a href="https://github.com/karthik/zenodo">Zenodo(R)</a> <br />
<a href="https://github.com/Tommos0/pyzenodo">PyZenodo(Python)</a></p>
<p><a href="https://cran.r-project.org/web/packages/dataverse/index.html">Dataverse(R)</a> <br />
<a href="https://github.com/IQSS/dataverse-client-python">Dataverse-client(Python)</a></p>
<p><a href="https://developer.github.com/v3/search/">Github Search API</a></p>
<h2 id="dataverse-walk-through">Dataverse Walk-through</h2>
<ul>
<li>Searching and using the <a href="https://dataverse.harvard.edu/">website/GUI</a>
<ul>
<li><a href="https://demo.dataverse.org/">Demo Dataverse</a> for fooling around with.</li>
</ul>
</li>
<li>
<p>A play example of using the <code class="highlighter-rouge">dataverse</code> package in R to search for data and download it.</p>
<p><a href="https://mybinder.org/v2/gh/wrathofquan/dataverse-R/master"><img src="https://mybinder.org/badge_logo.svg" alt="Binder" /></a></p>
<ul>
<li>Check out the <a href="https://cran.r-project.org/web/packages/dataverse/vignettes/A-introduction.html">vignettes</a> for more</li>
</ul>
</li>
</ul>
<h2 id="on-your-own">On your own</h2>
<p>Using the <a href="https://docs.google.com/spreadsheets/d/1KptHzDHIdB3s1v5m1mMwphcwXhOVWdkRYdjEWW1dqrE/edit#gid=355072175">Comparative Overview of Features</a> document as a template, think about your own research and the kind of repository (general vs. specific) that makes the most sense for your archival needs.</p>
<h2 id="contacts">Contacts</h2>
<p>https://researchdata.berkeley.edu/</p>
<p>http://dlab.berkeley.edu/</p>
<p>https://www.cdlib.org/services/uc3/dash.html</p>
Intro to Machine Learning with scikit-learn -- Robert Martin-Short2018-11-05T00:00:00+00:00https://BIDS.github.io/dats/posts/mlsklearn<h1 id="welcome">Welcome!</h1>
<h1 id="speakers">Speakers</h1>
<h2 id="robert-martin-short">Robert Martin-Short</h2>
<p><img src="https://BIDS.github.io/dats/bioimages/rmartinshort.png" alt="bio" />
PhD Candidate, Geophysics</p>
<p>Website: <a href="rmartinshort.jimdo.com">rmartinshort.jimdo.com</a></p>
<h1 id="content">Content</h1>
<h2 id="installation">Installation</h2>
<p>This workshop will be using the following languages and software:</p>
<ul>
<li>Python 3.6</li>
<li>Jupyter</li>
<li>scikit-learn</li>
</ul>
<p>All of these requirements can be satisfied with <a href="https://www.anaconda.com/download/">Anaconda</a>.</p>
<h2 id="materials">Materials</h2>
<p>The Jupyter notebooks containing the workshop material can be found in the following repo: <a href="https://github.com/rmartinshort/20171206_ML_basics_THW">basics of machine learning with scikit-learn</a></p>
DATS Round-table2018-10-29T00:00:00+00:00https://BIDS.github.io/dats/posts/round-table<h1 id="welcome">Welcome!</h1>
<h1 id="sign-in">Sign-In</h1>
<p>Please sign in at <a href="https://docs.google.com/spreadsheets/d/185JvEqGhEOSJzTmUCyz5szYw0vDZC-_BylTcvHl6HHI/edit#gid=0">this google sheet</a>!</p>
DATS Meet up2018-10-22T00:00:00+00:00https://BIDS.github.io/dats/posts/meet-up<h1 id="welcome">Welcome!</h1>
<h1 id="sign-in">Sign-In</h1>
<p>Please sign in at <a href="https://docs.google.com/spreadsheets/d/1JwP2hRF8Z7sN7I02akCxGUXQl13xcOaZlD7x4WJG7aY/edit#gid=0">this google sheet</a>!</p>
Matplotlib Two Ways -- Caroline Cypranowska2018-10-01T00:00:00+00:00https://BIDS.github.io/dats/posts/matplotlib<h1 id="welcome">Welcome!</h1>
<h1 id="speakers">Speakers</h1>
<h2 id="caroline-cypranowska">Caroline Cypranowska</h2>
<p><img src="https://BIDS.github.io/dats/bioimages/cypranowska.png" alt="bio" />
PhD Candidate, Department of Molecular and Cell Biology</p>
<p>Website: <a href="https://cypranowska.github.io/">cypranowska.github.io</a></p>
<h1 id="content">Content</h1>
<h2 id="installation">Installation</h2>
<p>This workshop will be using the following languages and software:</p>
<ul>
<li>Python 2.7/3.6</li>
<li>Jupyter</li>
<li>Matplotlib</li>
<li>Numpy</li>
</ul>
<p>All of these requirements can be satisfied with <a href="https://www.anaconda.com/download/">Anaconda</a>.</p>
<h2 id="materials">Materials</h2>
<p>The Jupyter notebooks containing the workshop material can be found in the following repo: <a href="https://github.com/BIDS/dats/tree/master/code_examples/matplotlib">code_examples/matplotlib</a></p>
Data tidying in R & Python -- Caroline Cypranowska and Sara Stoudt2018-09-24T00:00:00+00:00https://BIDS.github.io/dats/posts/data-tidying-R-python<h1 id="welcome">Welcome!</h1>
<h1 id="speakers">Speakers</h1>
<h2 id="caroline-cypranowska">Caroline Cypranowska</h2>
<p><img src="https://BIDS.github.io/dats/bioimages/cypranowska.png" alt="bio" />
PhD Candidate, Department of Molecular and Cell Biology</p>
<p>Website: <a href="https://cypranowska.github.io/">cypranowska.github.io</a></p>
<h2 id="sara-stoudt">Sara Stoudt</h2>
<p><img src="https://BIDS.github.io/dats/bioimages/stoudtsara.jpg" alt="bio" />
Graduate Student, Department of Statistics,
Moore/Sloan Fellow @ BIDS</p>
<p>Website: <a href="https://sastoudt.github.io/">sastoudt.github.io</a></p>
<h1 id="content">Content</h1>
<p>For this workshop we’ll be using materials created by Diya Das, David DeTomaso, and Andrey Indukaev. See the README.md file in Diya’s <a href="https://github.com/diyadas/tutorials">tutorial repo</a> to get started.</p>
Charles Frye -- Use You A Jupyter Notebook For Great Good!2018-09-17T00:00:00+00:00https://BIDS.github.io/dats/posts/jupyter<h1 id="welcome">Welcome!</h1>
<h1 id="agenda">Agenda</h1>
<h1 id="speakers">Speakers</h1>
<p><img src="https://BIDS.github.io/dats/bioimages/frye.png" alt="bio" /></p>
<h2 id="charles-frye">Charles Frye</h2>
<p>Graduate Student, Helen Wills Neuroscience Institute</p>
<p>Website: <a href="https://charlesfrye.github.io/">charlesfrye.github.io</a></p>
<p>Bio: <a href="https://charlesfrye.github.io/about/">here</a>.</p>
<h1 id="content">Content</h1>
<p>The content for this talk is available at
<a href="https://github.com/BIDS/dats/tree/master/code_examples/JupyterNotebookForGreatGood">this link</a>.
Head to the
<a href="https://github.com/BIDS/dats">GitHub repo for DATS</a>
for instructions on access.</p>
Mark Mikofski -- Git Version Control with GitHub2018-09-10T00:00:00+00:00https://BIDS.github.io/dats/posts/github-oss<h1 id="agenda">Agenda</h1>
<ol>
<li><a href="#requirements">Requirements</a></li>
<li><a href="#objectives">Objectives</a></li>
<li><a href="#git-vcs">What is Git VCS?</a></li>
<li><a href="#github">GitHub</a></li>
<li><a href="#github-pages">GitHub Pages</a></li>
<li><a href="#ssh-or-https">SSH or HTTPS</a></li>
<li><a href="#git-primer">Git Primer</a></li>
<li><a href="#winning-workflow">Winning Workflow</a></li>
</ol>
<h2 id="requirements">Requirements</h2>
<p>To prepare for this tutorial make sure you have the following:</p>
<ol>
<li>
<p>We’re going to use Git, so make sure you have Git installed on a laptop,
and of course, don’t forget to bring your laptop to the tutorial.</p>
<ul>
<li>
<p><strong>MacOS</strong>: you already have git, open a terminal and type git</p>
</li>
<li>
<p><strong>Windows</strong>: install <a href="https://gitforwindows.org/">Git-for-Windows</a>, no admin</p>
</li>
<li>
<p><strong>Linux</strong>: use your app manager, <em>eg</em> Ubuntu: <code class="highlighter-rouge">sudo apt install git</code></p>
</li>
</ul>
<p>For more info, see the <a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git">Git SCM Book on installing Git</a></p>
</li>
<li>We’re going to make a personal webpage on GitHub, so make sure your computer
has working internet access. AFAIK anyone can use
<a href="https://studenttech.berkeley.edu/get-online">CalVisitor or AirBears WiFi</a>
connection for free.</li>
<li>If you are not already registered for GitHub, please create an account. I
strongly recommend that you enable
<a href="https://help.github.com/articles/securing-your-account-with-two-factor-authentication-2fa/">two factor authentication</a>
using an app like Google Authenticator.</li>
<li>You’ll probably want a basic editor like Notepad on Windows, TextEdit on Mac,
and gedit in Linux, or you can also just edit your files directly on GitHub.
Anything will do, but <em>not</em> a word-processor, no, and a fullblown IDE is also
probably overkill. Something like <a href="https://www.sublimetext.com/">Sublime Text</a>
or <a href="https://notepad-plus-plus.org/">Notepad++</a> is just right IMHO.</li>
<li>A willingness to participate, try new things, make mistakes, learn and have fun!</li>
</ol>
<h2 id="objectives">Objectives</h2>
<p>At the end of this tutorial you will be able to do the following:</p>
<ol>
<li>explain to a colleague what version control is, why it’s important, what it’s
important for, and when to use it</li>
<li>use Git to version control your documents between iterations</li>
<li>teach a coworker to use basic git commands, and to create a pull request on GitHub</li>
<li>collaborate with others on GitHub using a feature-branch workflow</li>
<li>make a personal webpage using GitHub Pages</li>
</ol>
<h2 id="git-vcs">Git VCS</h2>
<p>What is Git? And why is it important?</p>
<h3 id="in-case-of-fire-git-commit-git-push-and-leave-the-building">In case of fire, git commit, git push and leave the building</h3>
<p><img src="../images/in_case_of_fire.png" alt="In case of fire, git commit, git push and leave the building" /></p>
<p><em>From <a href="https://github.com/louim/in-case-of-fire">GitHub repo <code class="highlighter-rouge">in-case-of-fire</code></a>
(c) 2015 <a href="https://twitter.com/louim">Louis-Michel Couture</a></em></p>
<h3 id="git-on-git">Git on Git</h3>
<blockquote>
<p>Git is a <a href="https://git-scm.com/about/free-and-open-source">free and open source</a>
distributed version control system designed to handle everything from small to
very large projects with speed and efficiency. [1]</p>
</blockquote>
<h3 id="xkcd-on-git">XKCD on Git</h3>
<p><img src="https://imgs.xkcd.com/comics/git.png" alt="xkcd 1597: Git" /></p>
<h3 id="version-control-software-vcs-aka-source-code-management-scm">Version Control Software (VCS) <em>aka</em> Source Code Management (SCM)</h3>
<p>But what is Version Control?</p>
<blockquote>
<p>… version control, <em>aka</em> source control, is the management of changes to
documents, computer programs, large web sites, and other collections of
information. [2]</p>
</blockquote>
<p>Whether you’re writing a dissertation, developing an analysis, or writing code,
you will revise, revise, and revise. Each iteration is important. Using Git VCS
gives you the ability:</p>
<ul>
<li>to reverse your work</li>
<li>take a new direction without losing your current position</li>
<li>recover from a hard drive crash</li>
<li>continue your work from a different laptop</li>
<li>collaborate with others,</li>
</ul>
<h4 id="references">References</h4>
<ol>
<li><a href="https://git-scm.com/">Git SCM</a></li>
<li><a href="https://en.wikipedia.org/wiki/Version_control">Wikipedia: Version Control</a></li>
</ol>
<h2 id="github">GitHub</h2>
<p>Repeat the following 3 times out loud:</p>
<blockquote>
<p>Git is <em>not</em> GitHub, and GitHub is <em>not</em> Git.</p>
</blockquote>
<p>GitHub is an online hosted Git service that acts as a centralized repository
for its users. You can create and clone Git repositories on GitHub, and you can
pull from and push to Git repositories on GitHub, just as if they were on your
own laptop, another networked laptop, or another online Git hosting service
like Bitbucket or GitLab.</p>
<p>If you have not already created a GitHub account, you need to create one now to
participate in this tutorial. Also, I encourage you to enable two-factor
authentication (TFA on your GitHub account, and store your backup codes in a
safe location, that you will remember. TFA makes it more difficult to hack your
account.</p>
<h2 id="github-pages">GitHub Pages</h2>
<p>GitHub allows users to host static content on <a href="https://pages.github.com/">GitHub Pages</a>.
Content written in markdown is automatically rendered as html using
<a href="https://jekyllrb.com/">Jekyll</a>, a Ruby static content generator. GitHub offers
themes to beautify your site look and layout. It’s a great place to host your
personal website.</p>
<ol>
<li>
<p>To create your personal GitHub Page, you need to create a new repository called
<code class="highlighter-rouge"><your-github-username>.github.io</code>, for example <code class="highlighter-rouge">mikofski.github.io</code>.</p>
</li>
<li>
<p>After the new repository is created, open the repository settings, and select
theme chooser.</p>
</li>
<li>
<p>After Choosing a theme, an online editor opens with <code class="highlighter-rouge">index.md</code>. You can make
edits to this file like change the title to your name.</p>
</li>
<li>
<p>Scroll to the bottom, find where it says commit directly to master, in the
first field enter, “initial commit”, and then press the commit button.</p>
</li>
</ol>
<p><strong>Congratulations!</strong> You’ve just made your first Git commit on GitHub, and created
your personal website. But, it’s far from done. It could use a little mroe work.
Let’s take it offline, and iterate on it, till it’s just the way you want.</p>
<h2 id="ssh-or-https">SSH or HTTPS</h2>
<p>In order to pull the repository to your laptop, you’ll have to prove to GitHub,
that you are who you say you are, and that you have permission to edit the site.
There are two ways to authenticate to GitHub:</p>
<ul>
<li>
<p><strong>SSH</strong>: you create a pair of keys, keep one private, and upload the public
key to GitHub. (Recommended)</p>
<ol>
<li>
<p>if your laptop has a folder called <code class="highlighter-rouge">.ssh</code> in your user profile and it
contains two files called <code class="highlighter-rouge">id_rsa</code> and <code class="highlighter-rouge">id_rsa.pub</code> then skip to step 4.</p>
</li>
<li>
<p>if your laptop does <em>not</em> have a <code class="highlighter-rouge">.ssh</code> folder, then open a shell type
<code class="highlighter-rouge">ssh-keygen</code></p>
</li>
<li>
<p>when prompted to for a passphrase, enter something that is easy to remember</p>
</li>
<li>
<p>on your laptop in a shell, type</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ eval `ssh-agent`
$ ssh-add
</code></pre></div> </div>
</li>
<li>
<p>if prompted for you passphrase and you know it, enter it, but if you don’t
know it, then kill the shell, delete the <code class="highlighter-rouge">.ssh</code> folder, and restart from step 2</p>
</li>
<li>
<p>on you laptop, open the <code class="highlighter-rouge">id_rsa.pub</code> file in <code class="highlighter-rouge">.ssh/</code> and copy the contents</p>
</li>
<li>
<p>online in your personal GitHub profile, in settings under SSH keys, click
New SSH key, paste the contents of your public key and click Add SSH key to save</p>
</li>
</ol>
</li>
<li>
<p><strong>HTTPS</strong>: You use your GitHub username and password, but if you enabled TFA,
this becomes more complicated. You have two more options:</p>
<ul>
<li>
<p>Windows: do nothing, Microsoft has already installed a credential manager
that works with GitHub to prompt you for your TFA code.</p>
</li>
<li>
<p>Mac/Linux Option A: create a personal access token with repo access</p>
<ol>
<li>
<p>in your personal GitHub profile under developer settings click generate
new personal access token, and check the repo full access box</p>
</li>
<li>
<p>on your laptop enable git credential store by typing
<code class="highlighter-rouge">git config credential.store</code></p>
</li>
<li>
<p>then when prompted by Git, use your GitHub username, and the personal
access token as your password.</p>
</li>
</ol>
</li>
<li>
<p>Mac/Linux Option B: download and
<a href="https://github.com/Microsoft/Git-Credential-Manager-for-Mac-and-Linux/blob/master/Install.md">install the Microsoft Git Crendential manager</a> - this does
everything in option 1 for you (Recommended)</p>
</li>
</ul>
</li>
</ul>
<h2 id="git-primer">Git Primer</h2>
<p>The most important Git command is <code class="highlighter-rouge">git</code>. If you type it in a terminal you get a
list of the other most important Git commands such as <code class="highlighter-rouge">init</code>, <code class="highlighter-rouge">clone</code>, <code class="highlighter-rouge">status</code>,
<code class="highlighter-rouge">log</code>, <code class="highlighter-rouge">diff</code>, <code class="highlighter-rouge">add</code>, <code class="highlighter-rouge">commit</code>, <code class="highlighter-rouge">checkout</code>, <code class="highlighter-rouge">remote add</code>, <code class="highlighter-rouge">pull</code>, and <code class="highlighter-rouge">push</code>.</p>
<p>The first thing you should do, after setting up your <code class="highlighter-rouge">.ssh</code> keys is to tell Git
your full name and email address to use. Then we can get your new website and
start hacking on it. The following commands are entered in a shell in a folder
you use for projects for.</p>
<ol>
<li>
<p>Add your name and email using <code class="highlighter-rouge">git config</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git config --global user.name "Your Name Comes Here"
$ git config --global user.email you@yourdomain.example.com
</code></pre></div> </div>
</li>
<li>
<p>Clone your GitHub repository to your laptop using <code class="highlighter-rouge">git clone</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># if you're using SSH
$ git clone git@github.com:<github-username>/<github-username>.github.io.git
# if you're using HTTPS
$ git clone https://github.com/<github-username>/<github-username>.github.io.git
</code></pre></div> </div>
</li>
<li>
<p>Enter the newly cloned repo, display the remotes and the log</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log
$ git remote
$ git remote show origin
</code></pre></div> </div>
</li>
<li>
<p>Now open your editor and make some changes to your <code class="highlighter-rouge">index.md</code> file.</p>
</li>
<li>
<p>Before you make too many changes, go back to the shell and view the status,
a diff from the previous version, and commit your changes</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git status
$ git diff
$ git commit -am "put any message here, usually under 50 characters"
</code></pre></div> </div>
</li>
</ol>
<h3 id="xkcd-on-git-commit">XKCD on Git Commit</h3>
<p><img src="https://imgs.xkcd.com/comics/git_commit.png" alt="xkcd 1296: Git Commit" /></p>
<h2 id="winning-workflow">Winning Workflow</h2>
<p>The secret power of using Git with GitHub is how easy it makes collaborating
with others. AFAIK the feature-branch workflow is the most frequent method of
collaboration on GitHub. I outlined it’s steps in a THW-Berkeley talk last year
on <a href="https://bids.github.io/dats/posts/2017-10-04-github-oss-f17.html">using GitHub in OSS</a>.</p>
<h2 id="additional-info">Additional Info</h2>
<ul>
<li><a href="https://help.github.com/">GitHub help pages</a> are a wealth of info.</li>
<li><a href="https://ohshitgit.com/">Oh Shit Git!</a> is a funny.</li>
<li><a href="https://git-scm.com/doc">Git SCM Documentation</a> is the official source.</li>
</ul>
First meeting of Fall 2018 Semester -- Organization2018-08-27T00:00:00+00:00https://BIDS.github.io/dats/posts/organization<h1 id="welcome-please-sign-in-at-bitdodats-082718">Welcome! Please sign in at <a href="https://bit.do/dats-082718">bit.do/dats-082718</a>.</h1>
<p>Direct link: <a href="https://docs.google.com/spreadsheets/d/1EKZKYoqAiIewM3Kpn6rAiZNwMkBMRLT_5oogHmZQTqo/edit?usp=sharing">here</a></p>
<h1 id="agenda">Agenda</h1>
<ul>
<li>4:10 - Intro to BIDS and our group // we’re on Berkeley time!</li>
<li>4:20 - Introductions (you!)</li>
<li>4:30 - What do we want to learn and what do we want to teach?</li>
<li>4:45 - Our GitHub repo and website</li>
</ul>
<h1 id="speakers">Speakers</h1>
<h3 id="caroline-cypranowska">Caroline Cypranowska</h3>
<p><img src="https://BIDS.github.io/dats/bioimages/cypranowska.png" alt="bio" />
PhD Candidate, Department of Molecular & Cell Biology and Chief Organizer, Data Analysis Tools Series</p>
<p>Website: <a href="https://cypranowska.github.io">cypranowska.github.io</a></p>
<p>Caroline Cypranowska is a PhD candidate in the Department of Molecular & Cell Biology at UC Berkeley and a National Science Foundation Graduate Fellow. She’s currently studying the genetic mechanisms of synaptic plasticity as a member of the Isacoff lab. Caroline has technical expertise in single-cell RNA-sequencing, TIRF microscopy, and single-molecule pull-down.</p>
<p>Outside of lab, Caroline volunteers as a Math instructor with the Prison University Project at San Quentin and as an organizer for The Data Analysis Tools Series, formerly known as The Hacker Within. She also enjoys backpacking, snowboarding, bouldering, and any other activity that can be done in the great outdoors.</p>
<h3 id="diya-das">Diya Das</h3>
<p><img src="https://BIDS.github.io/dats/bioimages/diyadas.png" alt="bio" /> Postdoctoral Researcher, Department of Molecular & Cell Biology and Moore-Sloan Data Science Fellow, Berkeley Institute for Data Science</p>
<p>Website: <a href="https://diyadas.github.io">diyadas.github.io</a></p>
<p>Diya is a postdoctoral researcher in the lab of John Ngai, where she studies regeneration in the olfactory epithelium, the tissue responsible for our sense of smell. She analyzes how olfactory stem cells contribute to both steady-state differentiation and injury-induced regeneration using single-cell RNA sequencing (scRNA-seq), assay for transposase-accessible chromatin sequencing (ATAC-seq) and other genomics techniques.</p>
<p>Diya also facilitates opportunities for fellow researchers to develop their data science skills. At BIDS, she coordinates Software/Data Carpentry workshops (she is a Software Carpentry instructor and lesson maintainer). Diya formerly organized The Hacker Within, which is now The Data Analysis Tools Series. She is also Fellow Lead of the Career Paths & Alternative Metrics Working Group (chaired by Henry Brady), which addresses the career paths available to data scientists within academia.</p>
Tim Howes -- File syncing tools - syncthing, dat, git-annex2018-05-02T00:00:00+00:00https://BIDS.github.io/dats/posts/file-syncing<h2 id="file-syncing-tools">File syncing tools</h2>
<p>I will discuss open source tools that you can use to sync files directly between
computers, rather than relying on paid cloud services such as dropbox. These
can be especially useful when dealing with large scientific datasets, which may
be impractical to sync to the cloud, and for which you may want more control over
versioning information. If you want something similar to a cloud service, but
with more control, you can set up these tools in your own virtual private server.</p>
<h3 id="syncthing">syncthing</h3>
<p><a href="https://syncthing.net">syncthing</a> is a cross-platform tool that can be used to
keep folders in sync between your own devices or to share with collaborators.
The settings can be customized to ignore certain files or sub-directories on
specific machines, and there are different options available for keeping copies
of old versions of files.</p>
<h3 id="dat">dat</h3>
<p><a href="https://datproject.org">Dat</a> is a protocol for peer-to-peer sharing of collections of files. This has
similar advantages to sharing files using bittorrent, but it also includes the
ability to update the files in an archive and track the version history.</p>
<h3 id="git-annex">git-annex</h3>
<p><a href="https://git-annex.branchable.com">git-annex</a> is a tool that allows you to track large files within your git
repositories, and it gives you a high level of control over which clones of the
repository actually get the full file contents and which get only small placeholder
files. This means that you can view and organize the full directory tree on your
local machine without having to actually download all the files, and you can download
the contents of individual files when needed using “git-annex get”. A special
git-annex branch tracks the locations of the file contents and ensures that the
correct number of copies exist on other machines before “dropping” the local file.</p>
<h2 id="usage-notes">Usage notes</h2>
<h3 id="syncthing-1">syncthing</h3>
<p>Syncthing keeps folders in sync between machines by making a secure, direct connection between the machines (or optionally by using relay servers if a direct connection is not possible). It is a simple tool that can be started at the command line, run in the background, and viewed/controlled via a web browser.</p>
<h4 id="installation">Installation</h4>
<p>https://docs.syncthing.net/intro/getting-started.html
https://docs.syncthing.net/users/autostart.html</p>
<p>Install and enable on Ubuntu:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt install syncthing
<span class="c"># Enable as automatic background service</span>
<span class="c"># replace 'myuser' with your username</span>
<span class="nb">sudo </span>systemctl <span class="nb">enable </span>syncthing@myuser.service
<span class="nb">sudo </span>systemctl start syncthing@myuser.service
<span class="c"># or run `syncthing` manually on the command line</span>
</code></pre></div></div>
<p>Check status on Ubuntu:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Check service status</span>
<span class="nb">sudo </span>systemctl status syncthing@myuser.service
<span class="c">#Check logs</span>
<span class="nb">sudo </span>journalctl <span class="nt">-e</span> <span class="nt">-u</span> syncthing@myuser.service
</code></pre></div></div>
<p>Install and enable on macOS: <br />
(First install homebrew: https://brew.sh/)</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew install syncthing
<span class="c">#Enable as automatic background service</span>
cp /usr/local/Cellar/syncthing/latest/homebrew.mxcl.syncthing.plist ~/Library/LaunchAgents/syncthing.plist
launchctl load ~/Library/LaunchAgents/syncthing.plist
<span class="c"># run `syncthing` manually on the command line</span>
</code></pre></div></div>
<p>You may need to adjust firewall settings to allow incoming connections. On Mac, you will usually be prompted to allow this the first time you start syncthing.</p>
<p>https://docs.syncthing.net/users/firewall.html</p>
<h4 id="connect-to-a-new-machine">Connect to a new machine</h4>
<p>Vist http://localhost:8384 to view the GUI for your running syncthing.</p>
<p>Click “Add remote device” and enter the device’s long unique ID. If you’re on the same local network as the other device, it will show up as a suggestion so you don’t have to type it.</p>
<p>Give the device whatever nickname you like. Specify the IP address (if it is stable) or leave as ‘dynamic’ to find the device automatically based on the ID. Choose which folders to share with the device. Choose ‘introducer’ if you would like to receive other folders automatically from the device.</p>
<p>https://docs.syncthing.net/intro/getting-started.html#configuring</p>
<h4 id="set-up-a-new-folder">Set up a new folder</h4>
<h4 id="ignore-files">Ignore files</h4>
<p>https://docs.syncthing.net/users/ignoring.html</p>
<h4 id="keep-old-versions">Keep old versions</h4>
<p>https://docs.syncthing.net/users/versioning.html</p>
<h4 id="other-tips">other tips</h4>
<ul>
<li>
<p>Set up a virtual private server on a cloud provider if you want to have an always-on machine that can act as the central hub.</p>
</li>
<li>
<p>If syncing files between Mac and Linux, you might need to watch out for case sensitivity (Linux filesystems are case-sensitive, Mac by default is not). You can create a new APFS volume on your Mac hard drive with case sensitivity enabled, and put your sync folders there to avoid issues.</p>
</li>
<li>
<p>If running on a server where you don’t have root access, download and run <code class="highlighter-rouge">syncthing</code> manually or enable as a user service.</p>
</li>
</ul>
<p>https://docs.syncthing.net/users/autostart.html#using-systemd</p>
<ul>
<li>See also the syncthing forum: https://forum.syncthing.net/</li>
</ul>
<h3 id="dat-1">dat</h3>
<p>https://docs.datproject.org/tutorial</p>
<p>Resources for data sharing with dat:
https://datbase.org/
https://blog.datproject.org/tag/science/</p>
<p>Beaker, a web browser based on dat that enables peer-to-peer, editable websites:
https://beakerbrowser.com/
https://beakerbrowser.com/2017/06/14/forking-websites-on-the-p2p-web.html</p>
<h3 id="git-annex-1">git-annex</h3>
<p>http://git-annex.branchable.com/walkthrough</p>
<h4 id="example-setup">Example setup</h4>
<p>Initialize a repository:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir project
<span class="nb">cd </span>project
git init
git annex init <span class="nt">--version</span><span class="o">=</span>6 <span class="s2">"My desktop"</span>
</code></pre></div></div>
<p>Add files:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp ~/Downloads/ubuntu.iso <span class="nb">.</span>
git annex add ubuntu.iso
git commit <span class="nt">-a</span> <span class="nt">-m</span> <span class="s2">"Added a file"</span>
</code></pre></div></div>
<p>Clone on another folder on the same computer (could be a removable drive):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /media/usb
git clone ~/project
<span class="nb">cd </span>project
git annex init <span class="nt">--version</span><span class="o">=</span>6 <span class="s2">"Portable drive"</span>
</code></pre></div></div>
<p>Sync between clones (takes care of commiting, pushing, and pulling):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /media/usb/annex
git annex sync
<span class="c"># To get the content of large files in this step, use --content</span>
git annex sync <span class="nt">--content</span>
</code></pre></div></div>
<h4 id="get-and-drop-files">Get and drop files</h4>
<h4 id="special-remotes">Special remotes</h4>
<h4 id="git-annex-assistant">git-annex assistant</h4>
<p>Automated sync tool with a GUI</p>
<p>https://git-annex.branchable.com/assistant/</p>
Accessing public data on .gov websites -- Caroline Cypranowska2018-04-25T00:00:00+00:00https://BIDS.github.io/dats/posts/public-data<h1 id="accessing-public-data-on-gov-websites-or-how-to-deal-with-bureaucrats">Accessing public data on .gov websites (or how to deal with bureaucrats)</h1>
<h2 id="prerequisites">Prerequisites</h2>
<p>Today’s exercises will require Bash. If you have a Mac or Linux machine, you’re mostly good to go.</p>
<h3 id="windows">Windows</h3>
<p>Most Windows users in need of a Bash terminal use <a href="https://www.cygwin.com/">Cygwin</a>, a collection of Linux software tools compiled for Windows. Other options include <a href="https://git-scm.com/download/win">Git</a> and creating a Linux subsystem (for Windows 10). The instructions below provide detailed instructions for installing Cygwin and a few other tools required for this tutorial.</p>
<ol>
<li>
<p>Download Cygwin and run <code class="highlighter-rouge">setup.exe</code>. Select ‘Install from Internet’ when prompted by the installation wizard. Choose your root directory and mirror for installation.</p>
</li>
<li>
<p>The installer will also download a list of available packages. Include the default packages, but make sure to search for and include <code class="highlighter-rouge">curl</code> and <code class="highlighter-rouge">wget</code>.</p>
</li>
<li>
<p>Add the Cygwin path to the Windows Environment Path Variable, which can be found in the ‘Advanced system settings’
menu. Append <code class="highlighter-rouge">;C:\cygwin\bin</code> to the end of the variable value option (assuming this is where you installed Cygwin).</p>
</li>
</ol>
<h3 id="macos">MacOS</h3>
<p>The terminal in MacOS has the majority of the tools needed to make requests to government databases, as cURL comes with Macs out of the box. The main advantage of <code class="highlighter-rouge">wget</code> over <code class="highlighter-rouge">curl</code> is that it can download recursively. While you can choose to do the exercises without <code class="highlighter-rouge">wget</code>, it can be easily installed with Homebrew.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo@bar:~<span class="nv">$ </span>brew install wget
</code></pre></div></div>
<h1 id="a-brief-explanation-of-networking-protocols">A brief explanation of networking protocols</h1>
<p>In networking, a protocol is a set of rules for communication. Peer-to-peer networks are composed of interconnected computers, but no computer has a privileged position. Client-server networks, on the other hand, are composed of servers that perform functions on behalf of other machines (clients). Both of these systems rely on protocols to send and receive data.</p>
<p>The set of protocols used on the Internet is called TCP/IP (Transmission Control Protocol/Internet Protocol). The TCP/IP model has a layered structure, and protocols like HTTP, FTP, and SSH run on the highest layer (the application layer).</p>
<p>HTTP (or hypertext transfer protocol) defines how computers exchange HTML documents, and FTP (or file transfer protocol) defines how computers move files between local and remote file systems. These are the primary tools we will use today to get our data.</p>
<p>HTTP and FTP each have methods for a client to make requests of the server, and for the server to return a response. HTTP requests and responses usually have a header, which contains meta data of the request.</p>
<p><img src="https://imgs.xkcd.com/comics/server_attention_span.png" alt="alt text" title="https://xkcd.com/869/" /></p>
<h1 id="apis">APIs</h1>
<p>Application programming interfaces (or APIs) are a set of rules for building application software. In this case it usually refers to accessing and posting data to a specific group of servers. Many government agency APIs for accessing data are catered towards people building web application software.</p>
<p>API documentation usually includes:</p>
<ul>
<li>how to format query strings</li>
<li>what types/formats of data that can be retrieved or posted with a request</li>
<li>authentication procedures</li>
</ul>
<h1 id="what-is-datagov">What is Data.gov?</h1>
<p><a href="https://www.data.gov/">Data.gov</a> is mostly a catalog of data sets collected by the agencies of the US Federal Government. It includes information about the agency that collected the data, meta data, landing pages for the project, and links to the web address where data can be retrieved, the format of the data, etc. etc.</p>
<h2 id="what-datagov-is-not">What Data.gov is not</h2>
<p>Data.gov doesn’t host the data directly, and doesn’t have a unified API for accessing data from all government agencies. While Data.gov does have <em>an</em> API, the types of information accessed with the API are data on the types of data in the catalog. So you get meta meta data.</p>
<h1 id="exercises">Exercises</h1>
<h2 id="getting-noaa-precipitation-data-from-an-ftp-server">Getting NOAA precipitation data from an FTP server</h2>
<p>The <a href="https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00313">U.S. Hourly Precipitation data set</a> is hosted on an FTP server and is well documented. Here you’ll find that there is a page for downloading data from specific date ranges and location, but if you want to store them on a server then you’ll (obviously) need to use FTP.</p>
<p>The <a href="ftp://ftp.ncdc.noaa.gov/pub/data/hourly_precip-3240/dsi3240.pdf">.pdf</a> describes the naming scheme and the <a href="ftp://ftp.ncdc.noaa.gov/pub/data/hourly_precip-3240/readme.txt">readme.txt</a> instructs how to open a connection to the server and where to find files.</p>
<h3 id="exercise-get-precipitation-records-from-ca-from-2000-2009">Exercise: Get precipitation records from CA from 2000-2009</h3>
<h4 id="according-to-the-docs-dont-run-this-before-we-discuss">According to the docs (don’t run this before we discuss)</h4>
<ol>
<li>Log into the FTP server</li>
</ol>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo@bar:~<span class="nv">$ </span>ftp ftp.ncdc.noaa.gov
</code></pre></div></div>
<ol>
<li>Navigate to the correct directory</li>
</ol>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ftp> <span class="nb">cd </span>pub/data/hourly_precip-3240/04
</code></pre></div></div>
<ol>
<li>Use <code class="highlighter-rouge">get</code> to download one file, or <code class="highlighter-rouge">mget</code> to get multiple files</li>
</ol>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ftp> mget 3240_04_200<span class="k">*</span>.tar.Z
</code></pre></div></div>
<p>Just a note, when logging into an FTP server your username and password aren’t encrypted. There are ways of doing FTP over SSH or with a secure-socket layer (SSL).</p>
<h4 id="the-safer-way">The safer way</h4>
<p><code class="highlighter-rouge">curl</code> has an option of using FTP with a SSL. We should choose this instead, because it will protect the traffic.</p>
<ol>
<li>
<p>Navigate to your preferred directory</p>
</li>
<li>
<p>Use the <code class="highlighter-rouge">--ftp-ssl</code> flag, the <code class="highlighter-rouge">--user</code> flag, and the <code class="highlighter-rouge">-o</code> option</p>
</li>
</ol>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo@bar:~<span class="nv">$ </span>curl <span class="nt">--ftp-ssl</span> <span class="nt">--user</span> anonymous:youremail@email.com ftp://ftp.ncdc.nooa.gov/04/3240_04_2000-2000.tar.Z <span class="nt">-o</span> ca_2000.tar.Z
</code></pre></div></div>
<h4 id="the-safer-recursive-way">The safer (recursive) way</h4>
<p><code class="highlighter-rouge">curl</code> doesn’t have a built-in method for easily getting multiple files. Write a shell script that will get all the CA precipitation data from 2000-2009.</p>
<p><code class="highlighter-rouge">wget</code> has a <code class="highlighter-rouge">-m</code> option for mirroring sites, that will allow you to download the entire contents of a directory.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo@bar:~<span class="nv">$ </span>wget <span class="nt">-mc</span> <span class="nt">-nH</span> <span class="nt">--ftps-implicit</span> <span class="nt">--no-ftps-resume-ssl</span> <span class="nt">--user</span><span class="o">=</span>anonymous <span class="nt">--password</span><span class="o">=</span>youremail@email.com ftp://ftp.ncdc.noaa.gov/pub/data/hourly_precip-3240/04/
</code></pre></div></div>
<h4 id="bonus">Bonus</h4>
<ol>
<li>
<p>Write a script for downloading the files you want from the NOAA FTP server with <code class="highlighter-rouge">curl</code>.</p>
</li>
<li>
<p>FTP isn’t super great for transferring large files. How can you tell if the files downloaded by <code class="highlighter-rouge">curl</code> are identical to the ones you mirrored with <code class="highlighter-rouge">wget</code> from the command line?</p>
</li>
</ol>
<h2 id="getting-usgs-earthquake-data-using-an-api">Getting USGS earthquake data using an API</h2>
<p>Skim the docs. Place a query to return GeoJSON records of earthquakes occuring 1) on your birthday, 2) in your favorite region of the world, 3) with a magnitude > 2.5</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo@bar:~<span class="nv">$ </span>curl <span class="nt">-O</span> https://earthquake.usgs.gov/fdsnws/event/1/query.geojson?starttime<span class="o">=</span>1991-09-21&endtime<span class="o">=</span>1991-09-21&maxlatitude<span class="o">=</span>43.373&minlatitude<span class="o">=</span>25.542&maxlongitude<span class="o">=</span><span class="nt">-101</span>.25&minlongitude<span class="o">=</span><span class="nt">-120</span>.234&minmagnitude<span class="o">=</span>2.5&orderby<span class="o">=</span><span class="nb">time</span>
</code></pre></div></div>
<p>The Python urllib and request libraries are great for formatting query strings and headers for more sophisticated endeavors than the exercise above. (But you can also do fancy things in Bash.)</p>
<h1 id="mini-challenge">Mini-challenge!</h1>
<p>(To be posted during the session)</p>
<h1 id="resources">Resources</h1>
<h2 id="project-open-data">Project Open Data</h2>
<p>Project Open Data was an initiative created by the Obama Administration to promote accessibility and visibility of data sets collected and curated by the Federal government. The <a href="https://project-open-data.cio.gov/">Project Open Data policy page</a> is mostly geared towards government officials wanting to publish agency data, but also includes some resources for harvesting metadata, converting file types, etc.</p>
<p>There’s also a <a href="https://labs.data.gov/dashboard/offices/qa">dashboard</a> to check out how well each government agency is complying with the Project Open Data policies.</p>
<h2 id="nasa">NASA</h2>
<p>Fonts aside, <a href="https://api.nasa.gov/">NASA</a> has their crap together.</p>
Nima Hejazi & Jeremy Coyle -- Machine Learning Pipelines for R with sl32018-04-18T00:00:00+00:00https://BIDS.github.io/dats/posts/ml-r<h2 id="about-nima-and-jeremy">About Nima and Jeremy</h2>
<p><a href="https://statistics.berkeley.edu/~nhejazi/">Nima</a> is a PhD student in the
<a href="https://statistics.berkeley.edu/biostat/">Group in Biostatistics</a>, where he is
jointly supervised by <a href="https://statistics.berkeley.edu/~laan">Mark van der Laan</a>
and <a href="https://hubbard.berkeley.edu">Alan Hubbard</a>. Nima is also affiliated with
the <a href="http://bbd.berkeley.edu/">UC Berkeley NIH Biomedical Big Data training
program</a> and the <a href="http://ccb.berkeley.edu/">Center for Computational
Biology</a>. Currently, his research centers around
nonparametric statistical and causal inference, machine learning, and
statistical computing – focusing on the development of robust techniques for
inference and estimation in an eclectic collection of problem settings, with
applications often arising in precision medicine, vaccine efficacy trials,
computational biology, and public policy.</p>
<p>Jeremy is a recent PhD graduate in Biostatistics who continues working with the
department to translate statistical theory to software. During his PhD studies,
Jeremy worked with <a href="https://hubbard.berkeley.edu">Alan Hubbard</a> and <a href="https://statistics.berkeley.edu/~laan">Mark van
der Laan</a> on a series of projects broadly
related to computational statistics, including more efficient cross-validation
routines for ensemble machine learning and a software framework for
cross-validation (<a href="https://origami.tlverse.org"><code class="highlighter-rouge">origami</code></a>). His current
research interests include causal inference, model selection, re-sampling
techniques, statistical software development, and statistical methods for
assessing time series data from sensor systems.</p>
<hr />
<h2 id="machine-learning-pipelines-for-r-with-sl3">Machine Learning Pipelines for R with <code class="highlighter-rouge">sl3</code></h2>
<p>We present <a href="https://github.com/tlverse/sl3"><code class="highlighter-rouge">sl3</code></a>, a recently developed
software package for the <a href="https://www.r-project.org">R language and environment for statistical
computing</a>, designed to provide utilities for
engaging in a host of common machine learning tasks. Topics to be addressed
include efficient data organization and accession, the construction of pipelines
for data munging and analysis (based on the idea popularized by Python’s
<a href="http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html"><code class="highlighter-rouge">scikit-learn</code></a>),
and methods for performing ensemble machine learning (e.g., optimal stacked
regressions). <a href="https://sl3.tlverse.org"><code class="highlighter-rouge">sl3</code></a> is a core part of the
<a href="https://tlverse.org"><code class="highlighter-rouge">tlverse</code></a>, a new ecosystem of software packages currently
being developed by a team in the <a href="https://statistics.berkeley.edu/biostat/">Group in
Biostatistics</a> here at Berkeley.</p>
<p><strong>Selected materials for this presentation are available on GitHub
<a href="https://github.com/nhejazi/sl3_lecture">here</a></strong>.</p>
<hr />
<h2 id="software-setup">Software Setup</h2>
<h3 id="r-and-rstudio-installation">R and RStudio Installation</h3>
<ul>
<li>You can download R <a href="https://www.r-project.org/">here</a> and the RStudio IDE
<a href="https://www.rstudio.com/products/rstudio/download/">here</a>.</li>
</ul>
<h3 id="jupyter-r-kernel-installation">Jupyter R Kernel Installation</h3>
<ul>
<li>Please follow the instructions
<a href="https://irkernel.github.io/installation/">here</a> to install an R kernel for
<a href="https://jupyter.org/">Jupyter</a>.</li>
</ul>
<h3 id="sl3-installation"><code class="highlighter-rouge">sl3</code> Installation</h3>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="s2">"devtools"</span><span class="p">)</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"tlverse/sl3@devel"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<h3 id="devtools-installation-if-needed"><code class="highlighter-rouge">devtools</code> installation (if needed)</h3>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">install.packages</span><span class="p">(</span><span class="s2">"devtools"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
Joint meetup with the Graduate Data Science Organization2018-04-11T00:00:00+00:00https://BIDS.github.io/dats/posts/GDSO<p>Instead of our typical THW lesson format, this week we will be having a joint event with the <a href="https://gdso.berkeley.edu/index.html">Graduate Data Science Organization</a>. As always, everyone is welcome to join, even if you’re not a graduate student or affiliated with UC-Berkeley.</p>
<p>The GDSO is a student led organization with the purpose of providing graduate students and postdoc fellows with resources to explore career opportunities in data science. In order to continue building connections between members on campus, we’ll be hosting monthly meetups for all of you interested in data science. These meetups will be informal and feature a few short talks about a variety of topics relevant to our organization. They will happen on the second Wednesday of every month at BIDS. Sign up for the GDSO mailing list or contact the organizers directly at officers@gdso.berkeley.edu.</p>
TBD -- please volunteer to lead!2018-04-04T00:00:00+00:00https://BIDS.github.io/dats/posts/TBDSpring Break -- no THW2018-03-28T00:00:00+00:00https://BIDS.github.io/dats/posts/spring-breakFlask -- Mark Mikofski2018-03-21T00:00:00+00:00https://BIDS.github.io/dats/posts/flask<h1 id="agenda">Agenda</h1>
<ol>
<li><a href="#mini-lesson">Mini lesson on Flask apps with Bokeh plots</a></li>
<li><a href="#mini-sprint-contest">Mini sprint contest</a> to develop a web app from NREL
developer API</li>
<li><a href="#miscellaneous-odds-and-ends">Miscellaneous odds and ends</a></li>
</ol>
<h1 id="bokeh-plots">Bokeh Plots</h1>
<p><img src="../images/flask_2018-03-21/bokeh_plot-1.png" alt="Stock Closing Prices" />
<img src="../images/flask_2018-03-21/bokeh_plot-2.png" alt="AAPL One-Month Average" /></p>
<h1 id="intro">Intro</h1>
<p>In my opinion, an interactive web application is fun way to share an analysis.
I believe users create deeper, more meaningful connections when they explore
data interactively. The goal of this tutorial will be to teach you how to
quickly make a simple web application that you can use to share your data
analyses online.</p>
<h1 id="requirements">Requirements</h1>
<p>You will need a laptop with Python installed for this tutorial. If you need to
install Python, please
<a href="https://www.anaconda.com/download/">download Anaconda 3.6-64bits</a> before you
attend this tutorial. During the tutorial we will use the following packags, so
please install them in a new conda or virtual environment:</p>
<ul>
<li><a href="http://flask.pocoo.org/">Flask</a></li>
<li><a href="https://bokeh.pydata.org/">Bokeh</a></li>
<li><a href="http://jinja.pocoo.org/">Jinja2</a></li>
<li><a href="http://docs.python-requests.org/">Requests</a></li>
</ul>
<p>This is easiest with Anaconda:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(root) ~/Projects/myapp $ conda create -n myvenv python==3.6.3 flask bokeh jinja2 requests
(root) ~/Projects/myapp $ activate myvenv
(myvenv) ~/Projects/myapp $
</code></pre></div></div>
<h1 id="mini-lesson">Mini lesson</h1>
<p>This mini lesson has 4 parts:</p>
<ol>
<li><a href="#flask">Flask</a></li>
<li><a href="#bokeh">Bokeh</a></li>
<li><a href="#jinja2-templates">Jinja2</a></li>
<li><a href="#bootstrap">Bootstrap</a></li>
</ol>
<p>Most of the snippets and examples from this mini-lesson are in the The Hacker
Within - Berkeley GitHub repository code examples folder
<a href="https://github.com/thehackerwithin/berkeley/tree/master/code_examples/flask/">here</a>.</p>
<h2 id="flask">Flask</h2>
<p><a href="http://flask.pocoo.org/">Flask</a> is a micro framework for developing web
applications. A web app runs in a browser. The web server can be run locally on
your laptop, or it can be on a remote server. Making a Flask app is
surprisingly easy! Copy the following into a new file and save it as
<code class="highlighter-rouge">hello.py</code>.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return 'Hello World!'
if __name__ == '__main__':
app.run()
</code></pre></div></div>
<p>This creates a new <code class="highlighter-rouge">app</code>, that will listen and respond to the “root” url, or
<code class="highlighter-rouge">/</code> with the callback function <code class="highlighter-rouge">hello()</code>. This decorated function can be called
a “route”. Now open a terminal window, browse to your app, and run it!</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(myvenv) ~/Projects/myapp/ $ python hello.py
</code></pre></div></div>
<p>You should see the following in your terminal:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
</code></pre></div></div>
<p>Open a browser and enter the url given. This url: <code class="highlighter-rouge">http://127.0.0.1</code> is also
called <code class="highlighter-rouge">localhost</code> and the second number, <code class="highlighter-rouge">5000</code>, is the port. You should see</p>
<blockquote>
<p>“Hello World!”</p>
</blockquote>
<p>in your browser. Congratulations! You’ve just written your first web app! Now
hit <code class="highlighter-rouge">ctrl-c</code> in your terminal to kill the app.</p>
<h2 id="bokeh">Bokeh</h2>
<p><a href="https://bokeh.pydata.org/">Bokeh</a> is a Python library for making interactive
“<a href="https://d3js.org/">D3</a>” style plots using a imperative style like
<a href="https://matplotlib.org/">Matplotlib</a> (versus a declarative style like
<a href="https://altair-viz.github.io/">Altair</a>). Bokeh is ideally suited for embedding
plots in a web app like Flask. Let’s see how you can add a Bokeh plot to your
<code class="highlighter-rouge">hello.py</code> Flask app.</p>
<ol>
<li>Make a new folder called <code class="highlighter-rouge">myapp-0/</code></li>
<li>Copy your old <code class="highlighter-rouge">hello.py</code> into the new folder</li>
<li>
<p>Change your new <code class="highlighter-rouge">myapp-0/hello.py</code> file as follows:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"""
My App 0: Hello with Bokeh plot.
"""
from bokeh.plotting import figure
from bokeh.resources import CDN
from bokeh.embed import file_html
from flask import Flask, Markup
app = Flask(__name__)
@app.route('/')
def hello():
plot = figure()
xdata = range(1, 6)
ydata = [x*x for x in xdata]
plot.line(xdata, ydata)
return Markup(file_html(plot, CDN, "my plot"))
if __name__ == '__main__':
app.run(debug=True)
</code></pre></div> </div>
</li>
<li>
<p>Open a terminal and navigate to <code class="highlighter-rouge">myapp-0/</code></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd ~/path/to/myapp-0/
</code></pre></div> </div>
</li>
<li>
<p>Activate your conda environment</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>myapp-0/ $ source ~/miniconda/bin/activate myenv
</code></pre></div> </div>
</li>
<li>
<p>Start the web app:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(myenv) myapp-0/ $ python hello.py
</code></pre></div> </div>
</li>
<li>Open a browser to http://localhost:5000/ or http://127.0.0.1:5000/ and you
should see a line plot that looks similar to this:</li>
</ol>
<p><img src="../images/flask_2018-03-21/hello_bokeh_line.png" alt="Hello Bokeh Line" /></p>
<p>Bokeh gives you several interactive features for free!</p>
<p><img src="../images/flask_2018-03-21/bokeh_tools.png" alt="Hello Bokeh Line" /></p>
<ul>
<li>link to the Bokeh documentation</li>
<li>pan, zoom, and reset</li>
<li>save</li>
</ul>
<ol>
<li>Finally hit <code class="highlighter-rouge">ctrl-c</code> in your terminal to kill the app.</li>
</ol>
<p>There are at least two ways to
<a href="https://bokeh.pydata.org/en/latest/docs/user_guide/embed.html">“embed” a Bokeh plot in an HTML document</a>:</p>
<ul>
<li><a href="https://bokeh.pydata.org/en/latest/docs/user_guide/embed.html#html-files">HTML files</a>:
create a stand alone HTML file</li>
<li><a href="https://bokeh.pydata.org/en/latest/docs/user_guide/embed.html#components">components</a>:
return the individual components used to embed the plot in any HTML file.</li>
</ul>
<p>This example used the first method. In the next example we’ll use the
“components” method to embed the plot in our own custom HTML file.</p>
<h2 id="jinja2-templates">Jinja2 Templates</h2>
<p><a href="http://jinja.pocoo.org/">Jinja2</a> is a Python library for making HTML files
with dynamic content that is rendered using a subset of the Python language.
The Jinja2 markup is enclosed in curly-cue braces and can refer to variables
and commands:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><!-- http://jinja.pocoo.org/docs/2.10/templates/#escaping -->
<ul>
{% for user in users %}
<li><a href="{{ user.url }}">{{ user.username }}</a></li>
{% endfor %}
</ul>
</code></pre></div></div>
<p><em>attribution: snippet from the <a href="http://jinja.pocoo.org/">Jinja2</a> documentation</em></p>
<p>The HTML files with Jinja2 markup are called “templates”. Flask can use Jinja
to render content placed in a folder called <code class="highlighter-rouge">templates</code> next to your app. Use
<a href="http://flask.pocoo.org/docs/0.12/quickstart/#rendering-templates"><code class="highlighter-rouge">render_template</code></a>
to specify the name of the template file and the desird data as keyword
arguments.</p>
<p>Let’s modify our “Hello” app to use a custom template and Bokeh components.</p>
<ol>
<li>Create a new folder called <code class="highlighter-rouge">myapp-1/</code> and copy the <code class="highlighter-rouge">myapp-0/hello.py</code> to it.</li>
<li>
<p>Create a <code class="highlighter-rouge">templates</code> folder inside <code class="highlighter-rouge">myapp-1/</code> and save the following file as
<code class="highlighter-rouge">myapp-1/templates/hello.html</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"><!-- http://jinja.pocoo.org/docs/2.10/templates/#escaping --></span>
<span class="cp"><!DOCTYPE html></span>
<span class="nt"><html</span> <span class="na">lang=</span><span class="s">"en"</span><span class="nt">></span>
<span class="nt"><head></span>
<span class="nt"><meta</span> <span class="na">charset=</span><span class="s">"utf-8"</span><span class="nt">></span>
<span class="nt"><title></span>{{ title }}<span class="nt"></title></span>
<span class="nt"><link</span>
<span class="na">href=</span><span class="s">"https://cdn.pydata.org/bokeh/release/bokeh-0.12.14.min.css"</span>
<span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nt"><link</span>
<span class="na">href=</span><span class="s">"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.14.min.css"</span>
<span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nt"><link</span>
<span class="na">href=</span><span class="s">"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.14.min.css"</span>
<span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nt"><script </span><span class="na">src=</span><span class="s">"https://cdn.pydata.org/bokeh/release/bokeh-0.12.14.min.js"</span><span class="nt">></script></span>
<span class="nt"><script </span><span class="na">src=</span><span class="s">"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.14.min.js"</span><span class="nt">></script></span>
<span class="nt"><script </span><span class="na">src=</span><span class="s">"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.14.min.js"</span><span class="nt">></script></span>
<span class="nt"></head></span>
<span class="nt"><body></span>
<span class="nt"><h1></span>Hello!<span class="nt"></h1></span>
{{ plot_div|safe }}
{{ plot_script|safe }}
<span class="nt"></body></span>
<span class="nt"></html></span>
</code></pre></div> </div>
</li>
<li>
<p>Now change <code class="highlighter-rouge">myapp-1/hello.py</code> to get the Bokeh components and render the
<code class="highlighter-rouge">hello.html</code> template:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"""
My App 1: Hello with Bokeh plot and Jinja2 template.
"""
from bokeh.plotting import figure
from bokeh.embed import components
from flask import Flask, request, render_template, abort, Response
app = Flask(__name__)
@app.route('/')
def hello():
plot = figure()
plot.circle([1, 2], [3, 4])
plot_script, plot_div = components(plot)
kwargs = {'plot_script': plot_script, 'plot_div': plot_div}
kwargs['title'] = 'hello'
if request.method == 'GET':
return render_template('hello.html', **kwargs)
abort(404)
abort(Response('Hello'))
if __name__ == '__main__':
app.run(debug=True)
</code></pre></div> </div>
</li>
<li>
<p>Open a terminal, navigate to your app, activate your conda environment, and
start your app:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd ~/path/to/myapp-1/
myapp-0/ $ source ~/miniconda/bin/activate myenv
(myenv) myapp-1/ $ python hello.py
</code></pre></div> </div>
</li>
<li>
<p>Open your browser to <code class="highlighter-rouge">localhost:5000</code> and you should see your app with the
title</p>
<blockquote>
<p>Hello!</p>
</blockquote>
<p>above the plot.</p>
</li>
</ol>
<p>In this example, we get the Bokeh components, <code class="highlighter-rouge">plot_script</code> and
<code class="highlighter-rouge">plot_div</code> and pass them to the template, <code class="highlighter-rouge">hello.html</code> using <code class="highlighter-rouge">render_template</code>.</p>
<p>The template must have several links and scripts to run Bokeh. These are placed
in the <code class="highlighter-rouge"><head></code> section of the HTML file. <a href="https://pydata.org/">PyData</a>
generously provides a content delivery network (CDN) to provide these files,
but you can also download them and host them locally. There are 3 cascading
style sheets (CSS) with custom HTML and 3 JavaScript files with scripts that
Bokeh uses to make interactive plots.</p>
<p>Finally the template must have a <code class="highlighter-rouge"><div></code> element where you want the Bokeh plot
to appear, and a <code class="highlighter-rouge"><script></code> which has your data and the Bokeh JavaScript code
to make your interactive plot.</p>
<h2 id="bootstrap">Bootstrap</h2>
<p><a href="https://getbootstrap.com/">Bootstrap</a> is an HTML framework and component
library of CSS and JavaScript files that takes the pain out of creative
attractive content for folks who are not web designers. To use it all you have
to do is put the CSS and Javascript links in your HTML. Follow the directions
in their <a href="https://getbootstrap.com/docs/4.0/getting-started/introduction/">getting started introduction</a>
to see where to put thes links.</p>
<h1 id="mini-sprint-contest">Mini Sprint Contest!</h1>
<p>We’re going to have a mini sprint so that you can practice what you’ve learned.
The goal will be to create an interactive Bokeh plot in a Flask app using data
from the NREL Developer Network.</p>
<ol>
<li>
<p>Go to <a href="https://developer.nrel.gov/">NREL Developer Network</a> and register for
an API key.</p>
<p><img src="../images/flask_2018-03-21/nrel_developer_api_key_signup.png" alt="NREL Developer API Key Signup" /></p>
</li>
<li>
<p>Start your engines.</p>
</li>
<li>
<p>Go!</p>
</li>
</ol>
<p>See
<a href="https://github.com/thehackerwithin/berkeley/tree/master/code_examples/flask/">Flask code examples</a>
for some ideas. (wip)</p>
<h1 id="miscellaneous-odds-and-ends">Miscellaneous odds and ends</h1>
<p>There are lot’s of other rabbit holes to jump down. Here are a few.</p>
<h2 id="html-css-and-js">HTML, CSS, and JS</h2>
<p>Understanding HTML 101 will make building your web app or generating static
content much easier. But understanding HTML is just the tip. You may quickly
find yourself dabbing in CSS and JS too. Embrace it. But beware of
misinformation - <strong>avoid</strong> <a href="https://en.wikipedia.org/wiki/W3Schools">W3Schools</a>
and go straight to the horses mouth. Mozilla invented the internet, not Al Gore,
(<em>j/k</em>) so when if you need to find out anything about HTML, CSS, or JS, always
check the <a href="https://developer.mozilla.org/en-US/">Mozilla Developer Network (MDN)</a>
first!</p>
<h2 id="other-htmlcssjs-frameworks">Other HTML/CSS/JS frameworks</h2>
<p>Writing your own CSS and JS is tiring. Making it look good, unless you’re a
pro, is nearly impossible. These frameworks make it easy to look like a pro.</p>
<ul>
<li><a href="https://getbootstrap.com/">Bootstrap</a></li>
<li><a href="https://reactjs.org/">React.js</a></li>
</ul>
<h2 id="static-content">Static content</h2>
<p>If you only need to generate your report once, or only occasionally, then a
static site is fine. Your plots can still be interactive, static just means
that the content on the site doesn’t change. A static site generator creates
HTML, CSS, and JS content from some other markup like Markdown or ReST. Some
hosts also offer static site generation and content management. And there are
some tools that can generate static content in the form of HTML even though
that’s not their primary function.</p>
<ul>
<li><a href="https://pages.github.com/">GitHub Pages</a> and <a href="https://jekyllrb.com/">Jekyll</a></li>
<li><a href="http://docs.getpelican.com/en/stable/">Pelican</a></li>
<li><a href="http://www.sphinx-doc.org/en/master/">Sphinx</a></li>
<li><a href="http://jupyter.org/">Jupyter</a></li>
<li><a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a></li>
<li><a href="https://developer.mozilla.org/en-US/">HTML/CSS/JS</a></li>
<li><a href="https://bokeh.pydata.org/en/latest/">Bokeh</a>, <a href="https://d3js.org/">d3</a>,
<a href="https://vega.github.io/vega/">Vega</a>, and
<a href="https://vega.github.io/vega-lite/">Vega-Lite</a></li>
<li><a href="https://www.blogger.com">Blogger</a>, <a href="https://wordpress.org/">WordPress</a>, and
<a href="https://wordpress.com/">WordPress.com</a> for hosting</li>
</ul>
<h2 id="web-frameworks-for-dynamic-content">Web frameworks for dynamic content</h2>
<p>If your site depends on user input or if you want your site to update
automatically with input from another source like an API or database, then you
will need to use a web framework and a web server. A web framework combines the
most common features from most web applications into a boilerplate design.
Additional features can usually be added with extensions and plugins. Some
frameworks are simpler than others, and some come with everything included.</p>
<ul>
<li><a href="http://flask.pocoo.org/">Flask</a></li>
<li><a href="https://www.djangoproject.com/">Django</a></li>
<li><a href="https://angularjs.org/">Angular</a></li>
</ul>
<h2 id="ajax">AJAX</h2>
<p>There is this crazy middle ground between static and dynamic content where you
get content and modify the DOM using AJAX directly from the browser. This is
way beyond the scope of this tutorial.</p>
<h2 id="web-api">Web API</h2>
<p>This is a web app that has a published interface or schema that users can use
to programmatically interact with the application without a browser. There are
serveral frameworks extensions that can be used to create a web API.</p>
<ul>
<li><a href="http://www.django-rest-framework.org/">Django REST Framework</a></li>
<li><a href="https://django-tastypie.readthedocs.io/en/latest/">TastyPie</a></li>
<li><a href="https://flask-restful.readthedocs.io/en/latest/">Flask RESTful</a></li>
<li><a href="https://www.flaskapi.org/">Flask API</a></li>
</ul>
<h2 id="embedded-ploting-libraries">Embedded ploting libraries</h2>
<ul>
<li><a href="https://bokeh.pydata.org/en/latest/">Bokeh</a></li>
<li><a href="https://plot.ly/">Plot.ly</a></li>
<li><a href="https://d3js.org/">d3</a></li>
<li><a href="https://vega.github.io/vega/">Vega</a> and
<a href="https://vega.github.io/vega-lite/">Vega-Lite</a></li>
<li><a href="http://docs.enthought.com/chaco/">Chaco</a></li>
<li><a href="https://matplotlib.org/">Matplotlib</a> - static only AFAIK</li>
<li><a href="http://mpld3.github.io/">mpld3</a></li>
</ul>
<h2 id="database-object-relational-mapper-orm">Database object relational mapper (ORM)</h2>
<p>If your web interacts with a database, then you should use an object relational
mapper. This tool converts native objects into database records and generates
database operations like SQL queries from native methods in the background,
making it simpler to create, read, update, and destroy data.</p>
<ul>
<li><a href="http://www.sqlalchemy.org/">SQL Alchemy</a></li>
<li><a href="https://www.djangoproject.com/">Django</a></li>
</ul>
<h2 id="hosting">Hosting</h2>
<p>If you want to share your site, or have it visible outside of your network then
you’ll need a host. Beware, once your data is public it’s on you to keep it
secure. Web frameworks will handle the most obvious threats, but you still need
to use common sense. Robots continuously crawl the internet and automatically
attack anything new that they find, regardless of how insignificant it is.</p>
<blockquote>
<p><strong>Warning</strong>: <em>If your application will require authentication, then you must
use HTTPS!</em></p>
</blockquote>
<ul>
<li><a href="https://www.heroku.com/">Heroku</a></li>
<li><a href="https://aws.amazon.com/">AWS</a></li>
<li><a href="https://cloud.google.com/appengine/">Google App Engine</a></li>
<li><a href="https://azure.microsoft.com/en-us/">Azure</a></li>
<li>local network or intranet</li>
</ul>
<h2 id="web-servers">Web servers</h2>
<p>Hopefully, you probably won’t have to deal with setting up a web server, since
this is usually handled by your hosting service, but it’s useful to know about
web servers at a high level. Typically you will see a WSGI server, WSGI is a
protocol for passing content to and from Python, and a web server that offers
the content to web browsers requesting it and accepts content from browsers
that send it. Most WSGI servers combine both of these but a dedicated web
server can offer more features and better performance. It’s not uncommon for a
single web app to be simultaenously running on several web servers and several
WSGI servers behind a single load balancer that also offers a CA-certificate
and port forwarding from HTTP (port 80) to HTTPS (port 443) to secure your
site.</p>
<ul>
<li><a href="https://httpd.apache.org/">Apache</a> + <a href="https://modwsgi.readthedocs.io/en/develop/">mod-wsgi</a></li>
<li><a href="http://gunicorn.org/">gunicorn</a> or <a href="https://uwsgi-docs.readthedocs.io/en/latest/">uwsgi</a></li>
<li>
<table>
<tbody>
<tr>
<td><a href="https://www.nginx.com/">nginx</a> + gunicorn</td>
<td>uwsgi</td>
</tr>
</tbody>
</table>
</li>
<li><a href="http://werkzeug.pocoo.org/">Werkzeug</a></li>
<li><a href="http://www.tornadoweb.org/en/stable/">Tornado</a></li>
</ul>
<h2 id="rest">REST</h2>
<p>In order for your application to run on several servers simultaneously, it
needs to be RESTful. REST stands for representational state transfer and
basically means that your app is stateless. In other words all of the
information that the servers need to run your app is contained one of three (or
maybe four) places:</p>
<ul>
<li>request header</li>
<li>query string</li>
<li>URL</li>
<li>(maybe a cookie or other client side cache that is used for client side operations only, eg: with JavaScript)</li>
</ul>
<h2 id="glossary">Glossary</h2>
<ul>
<li>DOM = <a href="https://en.wikipedia.org/wiki/Document_Object_Model">document object model</a></li>
<li>HTML = <a href="https://en.wikipedia.org/wiki/HTML">hypertext markup language</a></li>
<li>CSS = <a href="https://en.wikipedia.org/wiki/Cascading_Style_Sheets">cascading style sheets</a></li>
<li>JS = <a href="https://en.wikipedia.org/wiki/JavaScript">JavaScript</a> - it has <strong>nothing</strong> to do with Java, used to manipulate the DOM from the browser.</li>
<li>REST = <a href="https://en.wikipedia.org/wiki/Representational_state_transfer">representational state transfer</a></li>
<li>URL/URI = <a href="https://en.wikipedia.org/wiki/URL">universal resource locator</a>/<a href="https://en.wikipedia.org/wiki/Uniform_Resource_Identifier">identifier</a></li>
<li>HTTP = <a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">hypertext transfer protocol</a></li>
<li>HTTPS = <a href="https://en.wikipedia.org/wiki/HTTPS">with SSL or TLS</a></li>
<li>SSL = <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security">secure socket layer</a> was replaced by TLS</li>
<li>TLS = <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security">transport layer security</a></li>
<li>WSGI = <a href="https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface">web server gateway interface</a></li>
<li>ORM = <a href="https://en.wikipedia.org/wiki/Object-relational_mapping">object relational mapping</a></li>
<li>MVC/MVW = <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">model-view-controller or model-view-whatever</a></li>
<li>SQL = <a href="https://en.wikipedia.org/wiki/SQL">structured query language</a></li>
<li>API = <a href="https://en.wikipedia.org/wiki/Application_programming_interface">application programming interface</a></li>
<li>AJAX = <a href="https://en.wikipedia.org/wiki/Ajax_(programming)">asynchronous JavaScript and XML</a> - used for client side requests</li>
<li>CORS = <a href="https://en.wikipedia.org/wiki/Cross-origin_resource_sharing">cross-origin resource sharing</a></li>
<li>CRSF = <a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">cross-site request forgery</a></li>
<li>JSON = <a href="https://en.wikipedia.org/wiki/JSON">JavaScript object notation</a></li>
<li>XML = <a href="https://en.wikipedia.org/wiki/XML">extensible markup language</a></li>
</ul>
Joint meetup with the Graduate Data Science Organization2018-03-14T00:00:00+00:00https://BIDS.github.io/dats/posts/GDSO<p>Instead of our typical THW lesson format, this week we will be having a joint event with the <a href="https://gdso.berkeley.edu/index.html">Graduate Data Science Organization</a>. As always, everyone is welcome to join, even if you’re not a graduate student or affiliated with UC-Berkeley.</p>
<p>The GDSO is a student led organization with the purpose of providing graduate students and postdoc fellows with resources to explore career opportunities in data science. In order to continue building connections between members on campus, we’ll be hosting monthly meetups for all of you interested in data science. These meetups will be informal and feature a few short talks about a variety of topics relevant to our organization. They will happen on the second Wednesday of every month at BIDS. Sign up for the GDSO mailing list or contact the organizers directly at officers@gdso.berkeley.edu.</p>
No THW -- SF Open Drinks Meetup2018-03-07T00:00:00+00:00https://BIDS.github.io/dats/posts/SF-open-drinks<p>** RSVP REQUIRED **</p>
<p>There will be no meeting of The Hacker Within at Berkeley on March 7th. But if you’d like to hang out with some likeminded people interested in open source, open data, open knowledge, and open everything, Wednesday evening, then head to San Francisco for the SF Open Drinks meetup. March 7th from 5:30-7:30pm at the Wikimedia Foundation’s headquarters at Montgomery St BART (120 Kearny Street, Suite 1600). Due to security of the building, you <strong>must</strong> <a href="https://www.eventbrite.com/e/join-us-for-open-drinks-wikimedia-on-march-7-tickets-43690595748">RSVP on Eventbrite</a> beforehand and bring an ID. See more info and details there.</p>
Intro to D3.js -- Caroline Cypranowska2018-02-28T00:00:00+00:00https://BIDS.github.io/dats/posts/intro-d3js-sp18<h1 id="d3_simplemap">d3_simplemap</h1>
<p>D3 tutorial for making a <a href="https://bl.ocks.org/cypranowska/b17359016fd22b81914fd2031cb301f0">map</a> with data on US campgrounds from recreation.gov.</p>
<h1 id="intro-to-d3">Intro to D3</h1>
<h2 id="how-to-prepare-for-this-tutorial">How to prepare for this tutorial</h2>
<ol>
<li><a href="http://brackets.io/">Download an install Brackets</a>
*(This is my preferred tool for building visualizations with D3, but isn’t strictly necessary. It has a nice live preview function that serves the page to your browser. Other options include using node.js)</li>
<li>Fork or download the repository with the data (link coming soon)
*It has a template in the main directory that we’ll use to write our code, our raw data in a .csv file in the /data directory</li>
</ol>
<h2 id="so-what-is-d3">So what is D3?</h2>
<p>Data-driven Documents, better known as D3, is a JavaScript library for creating interactive data visualizations for the web. Mike Bostock, the primary developer of D3, first <a href="http://vis.stanford.edu/files/2011-D3-InfoVis.pdf">published</a> D3 in 2011, and it’s been a favorite data visualization tool.</p>
<p>However, D3 has a reputation for being a challenging library to master. This is because it requires knowledge of how SVG works, a bit about HTML/CSS, and a large dose of JavaScript. The goal of this workshop is to help you get a good enough sense of how D3 works so that you can try things on your own!</p>
<h2 id="going-in-svg-circles">Going in (SVG) circles</h2>
<p>D3 visualizations usually begin with creating SVG objects. So let’s create 3 circles using SVG.</p>
<pre><code class="language-HTML"><svg width="720" height="120">
<circle cx ="40" cy="60" r="10"></circle>
<circle cx ="80" cy="60" r="10"></circle>
<circle cx ="120" cy="60" r="10"></circle>
</svg>
</code></pre>
<p>D3 allows you to select elements and then manipulate them. Let’s change the color of the circles to <code class="highlighter-rouge">steelblue</code> and the radius to 30.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">circle</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">selectAll</span><span class="p">(</span><span class="s2">"circle"</span><span class="p">);</span>
<span class="nx">circle</span><span class="p">.</span><span class="nx">style</span><span class="p">(</span><span class="s2">"fill"</span><span class="p">,</span> <span class="s2">"steelblue"</span><span class="p">);</span>
<span class="nx">circle</span><span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"r"</span><span class="p">,</span><span class="mi">30</span><span class="p">);</span>
</code></pre></div></div>
<p>Now if you inspect the circles in your browser, the SVG markup should look like this:</p>
<pre><code class="language-HTML"><svg width="720" height="120">
<circle cx ="40" cy="60" r="30" style="fill:steelblue;"></circle>
<circle cx ="80" cy="60" r="30" style="fill:steelblue;"></circle>
<circle cx ="120" cy="60" r="30" style="fill:steelblue;"></circle>
</svg>
</code></pre>
<p>Instead of passing a string or an integer to a .style or .attr call, you can also pass a function. Try adding this line to your javascript code. What do you think the result would be?</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">circle</span><span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"cx"</span><span class="p">,</span><span class="kd">function</span> <span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span><span class="o">*</span><span class="mi">720</span> <span class="p">});</span>
</code></pre></div></div>
<p>Inspect the circles again in your browser. Now the <code class="highlighter-rouge">cx</code> parameter should be changing with each page refresh.</p>
<h2 id="binding-data-to-html-or-svg-elements-is-the-foundation-of-d3">Binding data to HTML or SVG elements is the foundation of D3</h2>
<p>How do I change the attributes of my SVG elements based on my data? The first step is to bind the data to the SVG elements. In the javascript portion of our document, delete the <code class="highlighter-rouge">circle.attr("r",30)</code> line and add the following:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">circle</span><span class="p">.</span><span class="nx">data</span><span class="p">([</span><span class="mi">32</span><span class="p">,</span><span class="mi">57</span><span class="p">,</span><span class="mi">112</span><span class="p">]);</span>
<span class="nx">circle</span><span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"r"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span> <span class="p">});</span>
</code></pre></div></div>
<p>Here <code class="highlighter-rouge">d</code> refers to the data we bound to the circles. Open the web inspector and run <code class="highlighter-rouge">console.log(d3.selectAll("circle"))</code>. Each element should have a <code class="highlighter-rouge">__data__</code> parameter, and that value should correspond to the data value.</p>
<p>We can also pass the index of elements that are selected. After removing the <code class="highlighter-rouge">circle.attr("cx", ...)</code> line, add the following:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">circle</span><span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"cx"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">i</span> <span class="o">*</span> <span class="mi">100</span> <span class="o">+</span> <span class="mi">30</span><span class="p">;</span> <span class="p">});</span>
</code></pre></div></div>
<p>Now the x location of each circle is a function of its index!</p>
<h2 id="but-what-if-i-had-1000000000000000-rows-of-data">But what if I had 1000000000000000 rows of data!!!</h2>
<p>With D3 you don’t need to explicitly write out every SVG element you want for your final data visualization. What you can do is make a virtual selection with D3, bind your data to it, and then create the elements that you want on the page. THIS is the magic of D3.</p>
<p>Go ahead, and delete the <code class="highlighter-rouge"><circle></code> elements from the SVG portion of your document. The javascript portion should look like this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* create an svg canvas, 300 by 100 px */</span>
<span class="kd">var</span> <span class="nx">svgCanvas</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="s2">"body"</span><span class="p">).</span><span class="nx">append</span><span class="p">(</span><span class="s2">"svg"</span><span class="p">)</span>
<span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"width"</span><span class="p">,</span> <span class="mi">300</span><span class="p">)</span>
<span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"height"</span><span class="p">,</span> <span class="mi">100</span><span class="p">);</span>
<span class="cm">/* the data */</span>
<span class="kd">var</span> <span class="nx">dat</span> <span class="o">=</span> <span class="p">[</span><span class="mi">32</span><span class="p">,</span><span class="mi">57</span><span class="p">,</span><span class="mi">112</span><span class="p">,</span><span class="mi">293</span><span class="p">];</span>
<span class="cm">/* select circles virtually, bind the data, add attributes */</span>
<span class="nx">svgCanvas</span><span class="p">.</span><span class="nx">selectAll</span><span class="p">(</span><span class="s2">"circle"</span><span class="p">)</span>
<span class="p">.</span><span class="nx">data</span><span class="p">(</span><span class="nx">dat</span><span class="p">)</span>
<span class="p">.</span><span class="nx">enter</span><span class="p">()</span>
<span class="p">.</span><span class="nx">append</span><span class="p">(</span><span class="s2">"circle"</span><span class="p">)</span>
<span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"cy"</span><span class="p">,</span> <span class="mi">60</span><span class="p">)</span>
<span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"cx"</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">i</span> <span class="o">*</span> <span class="mi">100</span> <span class="o">+</span> <span class="mi">30</span><span class="p">;})</span>
<span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">"r"</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">d</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span> <span class="p">});</span>
</code></pre></div></div>
<p>Appending to the virtual selection allows us to create circles for each data point, even if we don’t have those circles already drawn on the canvas.</p>
<h1 id="you-dont-need-to-reinvent-the-wheel">You don’t need to reinvent the wheel</h1>
<p>There are tons of resources for learning D3, and lots of code blocks to peruse through.</p>
<h2 id="online-learning-resources">Online learning resources</h2>
<ul>
<li><a href="https://github.com/d3/d3/wiki/Tutorials">D3 documentation</a></li>
<li><a href="http://alignedleft.com/tutorials/d3">Aligned Left</a></li>
<li><a href="https://www.dashingd3js.com/">Dashing D3</a> – not all content on this site is free</li>
</ul>
<h2 id="example-galleries">Example galleries</h2>
<ul>
<li><a href="https://github.com/d3/d3/wiki/Gallery">Official D3 Gallery</a></li>
<li>https://bl.ocks.org/</li>
<li>http://christopheviau.com/d3list/gallery.html</li>
</ul>
<h2 id="fancy-examples">Fancy examples</h2>
<ul>
<li>http://www.facesoffracking.org/data-visualization/</li>
<li>http://www.koalastothemax.com/</li>
</ul>
<p>And if all else fails… there’s always Google.
<img src="https://imgs.xkcd.com/comics/wisdom_of_the_ancients.png" alt="alt text" /></p>
<h1 id="now-lets-make-a-map">Now let’s make a map!</h1>
<p><img src="https://imgs.xkcd.com/comics/map_projections.png" alt="alt text" /></p>
Mark Mikofski -- SQL and relational databases2018-02-21T00:00:00+00:00https://BIDS.github.io/dats/posts/SQL-relationaldatabases<h1 id="agenda">Agenda</h1>
<ol>
<li><a href="#requirements">Requirements</a></li>
<li><a href="#objectives">Objectives</a></li>
<li><a href="#sql-examples">SQL Examples</a></li>
<li><a href="#relational-databases">Relational Databases</a></li>
<li><a href="#summary">Summary</a></li>
</ol>
<h2 id="xkcd-327-exploits-of-mom"><a href="https://xkcd.com/327/">XKCD 327: Exploits of Mom</a></h2>
<p><img src="https://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt="XKCD 327: Exploits of a Mom" /></p>
<h2 id="requirements">Requirements</h2>
<p>To prepare for this tutorial make sure you have the following:</p>
<ol>
<li>We’re going to use some Python, so make sure you have it installed on a laptop,
and of course, don’t forget to bring your laptop to the tutorial.</li>
<li>We’re going to use an example database and a Jupyter notebook with some code
examples, so make sure your computer has working internet access. AFAIK anyone can
use the Cal AirBears WiFi connection for free.</li>
<li>A willingness to participate, try new things, make mistakes, learn and have fun!</li>
</ol>
<h2 id="objectives">Objectives</h2>
<p>At the end of this tutorial you will be able to do the following:</p>
<ul>
<li>define what a database is</li>
<li>describe the difference between a relational database and no-SQL databases</li>
<li>write SQL code to
<ul>
<li>create a database, add a table to a database, and add a row
to a table</li>
<li>query a database by selecting fields that satisfy a condition</li>
<li>join two or more tables along a common field</li>
<li>calculate COUNT, MAX, and other aggregate functions</li>
</ul>
</li>
<li>name some common relational databases</li>
<li>explain some common usage patterns for databases</li>
</ul>
<h2 id="sql-examples">SQL Examples</h2>
<p>We’re going to use the examples from
<a href="https://github.com/thehackerwithin/berkeley/tree/master/code_examples/SQL"><code class="highlighter-rouge">code_examples/SQL</code></a>,
so point your browser to this link or clone The Hacker Within - Berkeley and
navigate to this folder.</p>
<h2 id="relational-databases">Relational Databases</h2>
<p><a href="https://en.wikipedia.org/wiki/Database">Wikipedia defines a database</a> as …</p>
<blockquote>
<p>An organized collection of data. A relational database, more restrictively, is
a collection of schemas, tables, queries, reports, views, and other elements.
… the most popular database systems since the 1980s have all supported the
relational model - generally associated with the SQL language.</p>
</blockquote>
<p>The main difference between a <em>database</em> and a object model like JSON or an
simple spreadsheet is the size and complexity, necessitating database management
software to quickly create, query, and retrieve data.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Relational_database"><em>relational database</em></a>
differs from other databases due to its strictly tabular structure consisting
of rows of records and columns of fields. <em>E.G.</em>:</p>
<table>
<thead>
<tr>
<th style="text-align: right">primary key</th>
<th>text field</th>
<th>integer field</th>
<th>date field</th>
<th>real field</th>
<th>boolean field</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">1</td>
<td>foo</td>
<td>234</td>
<td>2018-02-21T1700Z</td>
<td>5.67E-8</td>
<td>TRUE</td>
</tr>
<tr>
<td style="text-align: right">2</td>
<td>bar</td>
<td>123</td>
<td>2018-02-21T1830Z</td>
<td>1.6E-19</td>
<td>FALSE</td>
</tr>
</tbody>
</table>
<p>Other databases, called <a href="https://en.wikipedia.org/wiki/NoSQL"><em>noSQL</em></a>, have a
more flexible structure, allowing nested relations between keys, values, and
arrays. Some NoSQL databases are more scalable than relational databases and
can handle more data, making them useful for data science. Some examples of
NoSQL databases are: <a href="http://couchdb.apache.org/">CouchDB</a>,
<a href="https://www.mongodb.com/">MongoDB</a>, <a href="http://cassandra.apache.org/">Cassandra</a>,
<a href="https://aws.amazon.com/dynamodb/">AWS DynamoDB</a>, <em>etc.</em></p>
<h3 id="schema">Schema</h3>
<p>The <a href="https://en.wikipedia.org/wiki/Database_schema"><em>database schema</em></a> formerly
describes the structure of a database. For example the database in the table
above could be described as a table with six fields:</p>
<ol>
<li>a unique non-null field called the <a href="https://en.wikipedia.org/wiki/Primary_key"><em>primary key</em></a>.</li>
<li>a text field</li>
<li>an integer field</li>
<li><em>etc.</em></li>
</ol>
<h3 id="sql---a-structured-query-language">SQL - A Structured Query Language</h3>
<p>The language used to define the database schema, insert data, and make queries
is called <a href="https://en.wikipedia.org/wiki/SQL">SQL or Structured Query Language</a>.</p>
<h3 id="database-management-software">Database Management Software</h3>
<p>Database management typically consists of a server and a client. There are
<a href="https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems">several popular relational databases</a>:</p>
<ul>
<li><a href="https://www.postgresql.org/">PostgreSQL</a></li>
<li><a href="https://www.mysql.com/">MySQL</a></li>
<li><a href="https://sqlite.org/index.html">SQLite</a></li>
<li><a href="https://www.microsoft.com/en-us/sql-server/">MSSQL</a></li>
</ul>
<h3 id="clients-and-apis">Clients and APIs</h3>
<p>There are many ways to interface with a SQL database. Most databases come with
a command line client, <em>e.g.</em>:
<a href="https://www.postgresql.org/docs/current/static/app-psql.html"><code class="highlighter-rouge">psql</code></a> or a GUI,
<em>e.g.</em>: <a href="https://www.pgadmin.org/">pgAdmin</a>. Most databases also provide an API
for programmatically interaction, <em>e.g.</em>:
<a href="https://www.postgresql.org/docs/current/static/libpq.html"><code class="highlighter-rouge">libpq</code></a>.</p>
<h4 id="python-bindings">Python Bindings</h4>
<p>There are Python <a href="https://en.wikipedia.org/wiki/Language_binding">bindings</a> to
most database APIs:</p>
<ul>
<li><a href="http://initd.org/psycopg/">psycopg2</a></li>
<li><a href="https://mysqlclient.readthedocs.io/">mysqlclient</a></li>
<li><a href="https://docs.python.org/dev/library/sqlite3.html">sqlite3</a></li>
<li><a href="https://github.com/mkleehammer/pyodbc/wiki">pyodbc</a></li>
<li><a href="https://dev.mysql.com/downloads/connector/python/">Oracle MySQL Connector/Python</a></li>
<li><a href="http://www.pymssql.org/en/stable/">pymssql</a></li>
</ul>
<h4 id="object-relational-mapping">Object Relational Mapping</h4>
<p>It also possible to bind the database records directly to objects using
<a href="https://en.wikipedia.org/wiki/Object-relational_mapping">object relation mapping (ORM)</a>
with software such as <a href="https://www.djangoproject.com/">Django</a> or
<a href="http://www.sqlalchemy.org/">SQLAlchemy</a>. The advantage of using an ORM is that
instead of using SQL commands, you create objects native to the languange, and
the ORM takes care of creating the corresponding schema in the database.</p>
<h3 id="extra-sql-commands">Extra SQL commands</h3>
<p>When setting up a SQL database server, <em>eg</em> PostgreSQL, you will also need to
create a user, set a password, and create a database. I’ll leave these to the
reader to investigate on their own.</p>
<h2 id="summary">Summary</h2>
<p>SQL is not glamorous, and it’s been around for a long time, but it’s not that
difficult to teach yourself. There are ton of links here and in the
<a href="https://github.com/thehackerwithin/berkeley/tree/master/code_examples/SQL"><code class="highlighter-rouge">code_examples/SQL</code></a>
so I hope this will serve as a good starting point, but there is still so much
more to learn. If you have any suggestions, feel free to comment here or please
send a PR to <a href="https://github.com/thehackerwithin/berkeley">The Hacker Within, Berkeley</a></p>
<p>Thanks!</p>
Joint meetup with the Graduate Data Science Organization2018-02-14T00:00:00+00:00https://BIDS.github.io/dats/posts/GDSO-1<p>Instead of our typical THW lesson format, this week we will be having a joint event with the <a href="https://gdso.berkeley.edu/index.html">Graduate Data Science Organization</a>. As always, everyone is welcome to join, even if you’re not a graduate student or affiliated with UC-Berkeley.</p>
<p>The GDSO is a student led organization with the purpose of providing graduate students and postdoc fellows with resources to explore career opportunities in data science. In order to continue building connections between members on campus, we’ll be hosting monthly meetups for all of you interested in data science. These meetups will be informal and feature a few short talks about a variety of topics relevant to our organization. They will happen on the second Wednesday of every month at BIDS. Sign up for the GDSO mailing list or contact the organizers directly at officers@gdso.berkeley.edu.</p>
Stuart Geiger -- Intro to Jupyter Notebooks2018-02-07T00:00:00+00:00https://BIDS.github.io/dats/posts/jupyter<p>This session will be an introduction to using <a href="http://jupyter.org/">Jupyter notebooks</a>. No specific programming language expertise is required, although I’ll show how to use Jupyter to write code in python, R, and bash. We’ll walk through some of the basics together, so you can install Jupyter on your computer with <a href="https://www.anaconda.com/downloads">Anaconda</a> or you can launch a temporary virtual server with <a href="https://beta.mybinder.org/repo/thehackerwithin/berkeley/tags.html">our mybinder container</a>.</p>
<h1 id="some-links-and-resources">Some links and resources</h1>
<ul>
<li><a href="https://jupyter.readthedocs.io/en/latest/">Official Jupyter Documentation</a></li>
<li><a href="https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks">Gallery of interesting Jupyter notebooks</a></li>
<li><a href="http://ipython.readthedocs.io/en/stable/interactive/magics.html">IPython magic commands</a></li>
<li><a href="https://github.com/ipython-books/minibook-2nd-code">IPython minibook tutorial</a></li>
<li><a href="https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Jupyter_Notebook_Cheat_Sheet.pdf">Jupyter Cheat Sheet</a></li>
<li><a href="http://mybinder.org">MyBinder.org</a> – turn any GitHub repo with notebooks into a live temporary server</li>
</ul>
<h1 id="jupyter-and-python-is-a-repl-read-evaluate-print-loop">Jupyter (and Python) is a <a href="http://enwp.org/REPL">REPL</a>: Read-Evaluate-Print Loop</h1>
<p>You might be familiar with a REPL – the BASH command line is one too!</p>
<h1 id="mapping-out-different-uses">Mapping out different uses</h1>
<p>Note these are simplifications that aren’t 100% accurate – all models are wrong, but some are useful.</p>
<h2 id="what-you-may-be-familiar-with">What you may be familiar with</h2>
<p><img src="../images/jupyter/standard-python.png" alt="" /></p>
<h2 id="what-jupyter-notebook-does-on-your-computer">What Jupyter notebook does (on your computer)</h2>
<h3 id="basic-structure">Basic structure</h3>
<p><img src="../images/jupyter/local-simple.png" alt="" /></p>
<h3 id="writing-output-to-a-file">Writing output to a file</h3>
<p><img src="../images/jupyter/local-simple-output-file.png" alt="" /></p>
<h3 id="reading-a-file-with-bash">Reading a file with bash</h3>
<p><img src="../images/jupyter/local-simple-read-output-file.png" alt="" /></p>
<h3 id="writing-output-to-a-file-1">Writing output to a file</h3>
<p><img src="../images/jupyter/local-simple-output-file.png" alt="" /></p>
<h2 id="using-many-notebooks-and-kernels-on-your-computer">Using many notebooks and kernels (on your computer)</h2>
<p><img src="../images/jupyter/local-complex.png" alt="" /></p>
<h2 id="jupyter-on-a-remote-server">Jupyter on a remote server</h2>
<p><img src="../images/jupyter/jupyterhub-diagram.png" alt="" /></p>
Diya Das and David DeTomaso -- Intro to BASH and the command-line shell2018-01-31T00:00:00+00:00https://BIDS.github.io/dats/posts/bash-shell<p>In this session, we will attempt to teach Bash (i.e. the Unix shell, the thing that you have on your Mac that opens when you click Terminal) from an introductory to an intermediate/advanced level. Windows users: some versions of Windows 10 have a Linux subsystem, but you can also install <a href="http://cygwin.com/">Cygwin</a> to follow along.</p>
<p>We’ll start from basics of the shell and attempt to get all the way to some advanced stuff (I’m being vague on purpose; we’ll stop and answer questions so that might determine our endpoint).</p>
<p>Invite your friends, especially those who are scared of command line interfaces but still want to know things! We’ll be in BIDS / 190 Doe Library starting at 5pm on Wednesday. As always, we’re a come when you can / leave when you need to sort of group, but we hope to see you there.</p>
First Meeting -- What's on campus and what do we want to do this semester?2018-01-24T00:00:00+00:00https://BIDS.github.io/dats/posts/first-sp18<h1 id="welcome-please-sign-in-at-bitdothw-012418">Welcome! Please sign in at <a href="http://bit.do/thw-012418">bit.do/thw-012418</a>.</h1>
<h2 id="agenda">Agenda</h2>
<ul>
<li>5:10 - Introductions // we’re on Berkeley time!</li>
<li>5:20 - Presentations
<ul>
<li><a href="http://research-it.berkeley.edu/programs/berkeley-research-computing">Berkeley Research Computing</a></li>
<li><a href="http://research-it.berkeley.edu/programs/research-data-management">Research Data Management</a></li>
<li><a href="http://dlab.berkeley.edu/">D-Lab</a></li>
<li><a href="https://gdso.berkeley.edu/">Graduate Data Science Organization</a></li>
</ul>
</li>
<li>5:45 - Introduction to <a href="https://github.com/thehackerwithin/berkeley">our GitHub repo</a>
<ul>
<li>How to edit the website</li>
<li>Raising issues to request tutorials</li>
</ul>
</li>
<li>6:00 - What do we want to learn and what do we want to teach?</li>
</ul>
Intro to Machine Learning with Scikit-Learn -- Qingkai Kong2017-12-06T00:00:00+00:00https://BIDS.github.io/dats/posts/sklearn-f17<h2 id="goals-of-the-workshop">Goals of the workshop</h2>
<p>In this session, I will give a quick overview of the basic machine learning and an introduction of <a href="http://scikit-learn.org/stable/">sklearn</a>. The goals are:</p>
<ul>
<li>
<p>Understand the basics of Machine Learning, we will cover the classification and regression in this session.</p>
</li>
<li>
<p>Get familiar with the syntax of scikit-learn</p>
</li>
</ul>
<p>After the workshop, you should be able to use popular models in your problems.</p>
<h2 id="tutorial-material">Tutorial material</h2>
<p>Material for this session - <a href="https://github.com/qingkaikong/20171206_ML_basics_THW">Here</a></p>
<p>This tutorial is developed by <a href="http://seismo.berkeley.edu/qingkaikong/">Qingkai Kong</a></p>
<h2 id="references">References</h2>
<ul>
<li><a href="https://github.com/PythonWorkshop/intro-to-sklearn">Intro-to-sklearn</a></li>
<li><a href="https://github.com/jakevdp/sklearn_tutorial">sklearn tutorial</a> by <a href="https://staff.washington.edu/jakevdp/">Jake Vanderplas</a></li>
<li><a href="https://www.amazon.com/Python-Machine-Learning-Sebastian-Raschka/dp/1783555130/">Python Machine Learning</a> by <a href="https://sebastianraschka.com/">Sebastian Raschka</a></li>
<li><a href="http://scikit-learn.org/stable/documentation.html">sklearn documentation</a></li>
<li><a href="http://scikit-learn.org/stable/auto_examples/index.html">sklearn examples</a></li>
</ul>
Mining scientific articles with Public Library of Science (PLoS) -- Elizabeth Seiver2017-11-29T00:00:00+00:00https://BIDS.github.io/dats/posts/allofplos-f17<p><strong><a href="http://www.thehackerwithin.org/berkeley/plos.html">Link to presentation slides</a></strong></p>
<h2 id="csv-datasets">CSV datasets</h2>
<ul>
<li><a href="https://drive.google.com/open?id=0B_JDnoghFeEKQWlNUUJtY1pIY3c">allofplos metadata csv</a></li>
<li><a href="https://drive.google.com/open?id=0B_JDnoghFeEKeEp6S0R2Sm1YcEk">allofplos metadata csv without abstracts</a></li>
<li><a href="https://drive.google.com/open?id=0B_JDnoghFeEKLTlJT09IckMwOFk">zip file of 10,000 random PLOS articles in XML</a></li>
</ul>
<p>Have you ever wanted to learn how to mine the text and data from scientific articles? Come join us at The Hacker Within for a tutorial and mini-hackathon!</p>
<p>First will be a brief tutorial on the basic structure of XML documents, the JATS XML structure used by PLOS and other scientific publishers, as well as the XML parsing tools in <a href="https://github.com/PLOS/allofplos">allofplos</a>, a Python library that downloads and parses PLOS articles. Then we’ll have some time to mine the corpus, contribute to the allofplos codebase, or whatever else you want to do with hundreds of thousands of research articles at your fingertips!</p>
<p>Spots are limited, so please sign up here: <a href="https://www.eventbrite.com/e/plos-the-hacker-within-mining-scientific-articles-tutorial-hackathon-tickets-39877458552">https://www.eventbrite.com/e/plos-the-hacker-within-mining-scientific-articles-tutorial-hackathon-tickets-39877458552</a>.</p>
<p>The tutorial portion will be broadcast live and recorded on YouTube. While a working knowledge of Python is helpful, we will also have .csv documents of allofplos’s metadata that can be parsed in R.</p>
<p>Pizza will be provided.</p>
<p><img src="http://www.thehackerwithin.org/berkeley/images/2017-plos-hackathon.png" alt="info" /></p>
<h2 id="about-the-presenter">About the presenter</h2>
<p>Elizabeth Seiver is a Researcher at the Public Library of Science, a non-profit Open Access publisher. She wrote the codebase for allofplos.</p>
No THW on 11/22-- next meeting 11/292017-11-22T00:00:00+00:00https://BIDS.github.io/dats/posts/thanksgiving-f17TBD -- TBD2017-11-15T00:00:00+00:00https://BIDS.github.io/dats/posts/TBDNo THW this week -- next meeting 11/152017-11-08T00:00:00+00:00https://BIDS.github.io/dats/posts/TBDVisualizations in R with ggplot2 -- Rebecca Barter2017-11-01T00:00:00+00:00https://BIDS.github.io/dats/posts/ggplot2-f17<p>In this session you will learn how to build impressive ggplot2 figures. To help along the way, I will teach you the grammar of graphics, the basic plot types available in ggplot2, and a plethora of ways to customize your figures (including, time permitting, making your own ggplot theme).</p>
<p>The jupyter notebook containing the materials for this session can be found here: https://github.com/rlbarter/ggplot2-thw.</p>
No THW this week -- next session on 11/12017-10-25T00:00:00+00:00https://BIDS.github.io/dats/posts/TBDUtility Functions in R -- Diya Das2017-10-18T00:00:00+00:00https://BIDS.github.io/dats/posts/utility-r-f17<p>This session will cover some useful R functions, with a focus on installing packages from various sources and managing environments. I’ll also present some customizations to RStudio that I’ve found helpful in my work.</p>
<p>A minimal background in R is recommended: be familiar with basic arithmetic and have previously installed a package in R.</p>
<p>Please make sure you have installed <a href="https://www.r-project.org/">R</a> and <a href="https://www.rstudio.com/">RStudio</a>.</p>
Using Jupyter Notebooks -- Stuart Geiger2017-10-11T00:00:00+00:00https://BIDS.github.io/dats/posts/jupyter-f17<p>This session will be an introduction to using <a href="http://jupyter.org/">Jupyter notebooks</a>. No specific programming language expertise is required, although I’ll show how to use Jupyter to write code in python, R, and bash. We’ll walk through some of the basics together, so you can install Jupyter on your computer with <a href="https://www.anaconda.com/downloads">Anaconda</a> or you can launch a temporary virtual server with <a href="https://beta.mybinder.org/repo/thehackerwithin/berkeley/tags.html">our mybinder container</a>.</p>
<h1 id="some-links-and-resources">Some links and resources</h1>
<ul>
<li><a href="https://jupyter.readthedocs.io/en/latest/">Official Jupyter Documentation</a></li>
<li><a href="https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks">Gallery of interesting Jupyter notebooks</a></li>
<li><a href="http://ipython.readthedocs.io/en/stable/interactive/magics.html">IPython magic commands</a></li>
<li><a href="https://github.com/ipython-books/minibook-2nd-code">IPython minibook tutorial</a></li>
<li><a href="https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Jupyter_Notebook_Cheat_Sheet.pdf">Jupyter Cheat Sheet</a></li>
<li><a href="http://mybinder.org">MyBinder.org</a> – turn any GitHub repo with notebooks into a live temporary server</li>
</ul>
<h1 id="jupyter-and-python-is-a-repl-read-evaluate-print-loop">Jupyter (and Python) is a <a href="http://enwp.org/REPL">REPL</a>: Read-Evaluate-Print Loop</h1>
<p>You might be familiar with a REPL – the BASH command line is one too!</p>
<h1 id="mapping-out-different-uses">Mapping out different uses</h1>
<p>Note these are simplifications that aren’t 100% accurate – all models are wrong, but some are useful.</p>
<h2 id="what-you-may-be-familiar-with">What you may be familiar with</h2>
<p><img src="../images/jupyter/standard-python.png" alt="" /></p>
<h2 id="what-jupyter-notebook-does-on-your-computer">What Jupyter notebook does (on your computer)</h2>
<h3 id="basic-structure">Basic structure</h3>
<p><img src="../images/jupyter/local-simple.png" alt="" /></p>
<h3 id="writing-output-to-a-file">Writing output to a file</h3>
<p><img src="../images/jupyter/local-simple-output-file.png" alt="" /></p>
<h3 id="reading-a-file-with-bash">Reading a file with bash</h3>
<p><img src="../images/jupyter/local-simple-read-output-file.png" alt="" /></p>
<h3 id="writing-output-to-a-file-1">Writing output to a file</h3>
<p><img src="../images/jupyter/local-simple-output-file.png" alt="" /></p>
<h2 id="using-many-notebooks-and-kernels-on-your-computer">Using many notebooks and kernels (on your computer)</h2>
<p><img src="../images/jupyter/local-complex.png" alt="" /></p>
<h2 id="jupyter-on-a-remote-server">Jupyter on a remote server</h2>
<p><img src="../images/jupyter/jupyterhub-diagram.png" alt="" /></p>
Using GitHub in Open Source Software Projects -- Mark Mikofski2017-10-04T00:00:00+00:00https://BIDS.github.io/dats/posts/github-oss-f17<h1 id="agenda">Agenda</h1>
<ol>
<li><a href="#free-and-open-source-software-foss-or-oss">What is FOSS?</a></li>
<li><a href="#the-importance-of-contributing-to-open-source">Why contribute to FOSS?</a></li>
<li><a href="#ways-to-contribute">Different ways to participate</a></li>
<li><a href="#using-github-for-open-source-projects">Nuts and bolts</a></li>
</ol>
<h2 id="free-and-open-source-software-foss-or-oss">Free and Open Source Software, FOSS or OSS</h2>
<p>There are many definitions of “open source software” and even different names like <a href="https://en.wikipedia.org/wiki/Free_and_open-source_software">“free and open source software”</a>.</p>
<p>A Wikipedia post on <a href="https://en.wikipedia.org/wiki/Open-source_software">open source software</a> says the following:</p>
<blockquote>
<p>Open-source software (OSS) is computer software with its source code made available with a license in which the copyright
holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Open-source software
may be developed in a collaborative public manner. According to scientists who studied it, open-source software is a prominent
example of open collaboration.[2] The term is often written without a hyphen as “open source software”.</p>
</blockquote>
<p>GitHub’s <a href="https://opensource.guide/">Open Source Guide</a> by Nadia Eghbal answers the question: <a href="https://opensource.guide/starting-a-project/#what-does-open-source-mean">What does “open source” mean?</a> in a section called: <a href="https://opensource.guide/starting-a-project/">Starting an Open Source Project</a>.</p>
<blockquote>
<p>When a project is open source, that means anybody can view, use, modify, and distribute your
project for any purpose. These permissions are enforced through an open source license.</p>
</blockquote>
<p>For more rigor check out the the <a href="https://opensource.org/osd">Open Source Initiative (OSI) definition</a>, but the bottom line is that open
source code is <em>free</em>, as in <strong>free beer</strong>.</p>
<p><img src="../images/yckreEqei.jpg" alt="Free beer" title="Free beer" /></p>
<h2 id="the-importance-of-contributing-to-open-source">The importance of contributing to open source</h2>
<p>Why do people create open source software? <a href="https://opensource.guide/starting-a-project/#why-do-people-open-source-their-work">GitHub’s open source guide says, “There are many reasons”</a>:</p>
<blockquote>
<ul>
<li>Collaboration: “Open source projects can accept changes from anybody in the world.”</li>
<li>Adoption: “Open source projects can be used by anyone for nearly any purpose. People can even use it to build other things.”</li>
<li>Transparency: “Anyone can inspect an open source project for errors or inconsistencies.”</li>
</ul>
</blockquote>
<p>Wikipedia discusses the <a href="https://en.wikipedia.org/wiki/Open-source_software#Advantages_and_disadvantages">“open source development model: advangtages and disadvantages”</a> (emphasis mine):</p>
<blockquote>
<ul>
<li>“Open source software is usually easier to obtain than proprietary software, often resulting in <strong>increased use</strong>.”</li>
<li>“Open source development offers the potential for a <strong>more flexible technology and quicker innovation</strong>.”</li>
</ul>
</blockquote>
<p>The <a href="https://opensource.org/">OSI</a> lists their reasons too (emphasis mine):</p>
<blockquote>
<ul>
<li>Developers: “Open source projects provide tremendous opportunities for developers to <strong>share and learn through collaboration</strong>.”</li>
<li>Business: “… enterprises have realized the promise of open source: <strong>higher quality, greater reliability, more flexibility, lower cost</strong> …”</li>
<li>Non-Profit: “… open source ethos of contribution & community helps make life for NPO & NGO staffers easier”</li>
</ul>
</blockquote>
<p>Google has also recently published their <a href="https://opensource.google.com/docs/why/">open souce guidelines</a>.</p>
<h2 id="ways-to-contribute">Ways to contribute</h2>
<p>There are many ways to find and contribute to open source. Here are a few …</p>
<ol>
<li><a href="https://opensourcefriday.com/">Open Source Fridays by GitHub</a></li>
<li><a href="https://opensource.guide/">GitHub’s Open Source Guide by Nadia Eghbal</a></li>
<li><a href="https://hacktoberfest.digitalocean.com/">Hacktoberfest sponsored by Digital Ocean</a></li>
<li><a href="https://github.com/open-source">GitHub</a></li>
</ol>
<h2 id="using-github-for-open-source-projects">Using GitHub for Open Source projects</h2>
<p>GitHub is an ideal tool for open source projects for many reasons. It’s free for open source projects. The issue, pull request and
review tools make contributing to open source much easier. And other tools like a wiki, issue or pull request templates, and automatic
detection of licenses, contribution guidelines, and codes of conduct are also very useful.</p>
<h3 id="the-license">The license</h3>
<p>Whether you are using, creating or contributing to open source, it’s useful to have a basic understanding of licenses.
<a href="https://opensource.org/licenses">According to OSI there at least 9 common licenses.</a> GitHub created
<a href="https://choosealicense.com/">choose a license</a> to help users choose and create a license. There are even
<a href="https://creativecommons.org/">licenses for works of art and prose by Creative Commons</a> for use in blogs and other online creations that
aren’t necessarily computer code.</p>
<h3 id="code-of-conduct-and-contribution-guidelines">Code of Conduct and Contribution Guidelines</h3>
<p>You want to read these and follow them.</p>
<h3 id="issues">Issues</h3>
<p>One of the easiest ways to contribute to open source is to create an issue. Issues can be technical, code-related or an improvement to
the documentation. There is no issue too big or too small, and never any dumb questions, only dumb answers. However try to empathize
with the other users and maintainers when reporting issues. They may be overwhelmed by a deluge of issues, and they are typically
volunteering their precious free time. So a little preparation or ground work before submitting an issue will go a long way to getting
the issue resolved.</p>
<ol>
<li>
<p>Try to solve the issue yourself. Spend a reasonable amount of time on this to show that you’ve done your research.</p>
<ul>
<li>Check if the open source project has a <a href="https://groups.google.com/">Google group</a> or a Slack or IRC channel and
search for common questions or issues you have. Ask for help from the forum.</li>
<li>Ditto for <a href="https://stackoverflow.com/">StackOverflow</a>.</li>
</ul>
</li>
<li>
<p>If there are submission guidelines or an issue template, read and follow it very carefully, complete all sections as thoroughly
as possible.</p>
<ul>
<li>Include in your issue something that approaches a <a href="https://stackoverflow.com/help/mcve">minimum complete verifiable example</a> of
your issue.</li>
<li>It should go without saying, but be polite, respectful and constructive. <a href="https://opensource.org/node/877">Assume Good Faith</a></li>
</ul>
</li>
<li>
<p>Scratch your own itch. Follow your issue with a <a href="#pull-requests">pull request</a>.</p>
</li>
</ol>
<h3 id="pull-requests">Pull Requests</h3>
<p><a href="https://help.github.com/articles/about-pull-requests/">Pull requests (PR’s)</a> are one of the most useful keys to contributing to open
source. With a few exceptions, PR’s are how most open source projects receive contributions. A PR is not a Git feature; a PR is a
feature of GitHub and other online hosted repositories. A PR is defined by GitHub as follows:</p>
<blockquote>
<p>Pull requests let you tell others about changes you’ve pushed to a repository on GitHub. Once a pull request is opened, you can
discuss and review the potential changes with collaborators and add follow-up commits before the changes are merged into the
repository.</p>
</blockquote>
<p>I wrote a <a href="http://poquitopicante.blogspot.com/2016/10/winning-workflow.html">blog post called winning workflow</a> about how we use PR’s in my team to collaborate.</p>
<p><img src="../images/workflow-allcolor.png" alt="winning workflow" title="winning workflow" /></p>
<h4 id="step-1-fork-the-repository">Step 1: Fork the repository</h4>
<p>The first step in contributing to an open source project should be to <a href="https://help.github.com/articles/fork-a-repo/">fork the repository</a>. <a href="https://guides.github.com/activities/forking/">Forking a repository</a> allows you to create pull requests for your contributions. From the main GitHub page for the project find the fork button and select your personal GitHub profile as the location for your fork.</p>
<p><img src="https://github-images.s3.amazonaws.com/help/bootcamp/Bootcamp-Fork.png" alt="forking a repository" title="forking a repository" /></p>
<h4 id="step-1-12-the-shortcut">Step 1-1/2: The shortcut</h4>
<p>You can work, commit and submit a PR directly from GitHub by editting and creating new files directly in GitHub online. Make sure to select that you want GitHub to create a new branch and submit your PR when you commit your work.</p>
<blockquote>
<p>Create a new branch for this commit and start a pull request.</p>
</blockquote>
<p>Then for future commits you would commit directly to the “patch-N” branch created by GitHub for your pull request.</p>
<blockquote>
<p>Commit directly to the <code class="highlighter-rouge">patch-1</code> branch.</p>
</blockquote>
<p>In fact this is exactly the shortcut I’m using to edit this file. However there are some limitations to this approach. You may not be able to upload images this way, but you can start with this shortcut and then continue with the remaining steps anytime.</p>
<h4 id="step-2-clone-your-fork">Step 2: Clone your fork</h4>
<p>The second step is to <a href="https://git-scm.com/docs/git-clone">use Git to clone</a> the fork you just created from GitHub.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>me@mycomputer ~/projects
$ git clone git@github.com:me/oss-proj-fork.git # your url might be https://github.com/me/oss-proj-fork.git
</code></pre></div></div>
<p>This copies the repository to your computer where you can work on it.</p>
<h4 id="step-3-add-upstream-git-remote">Step 3: Add “Upstream” Git Remote</h4>
<p>The third step is to <a href="https://git-scm.com/docs/git-remote">add a Git remote </a> to the original open source project which you forked on GitHub. For convenience sake we’ll call this the “upstream” repository, but call it whatever you want.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>me@mycomputer ~/projects/oss-proj-fork (master)
$ git remote add upstream git@github.com:oss-people/oss-proj.git # your url might be https://github.com/oss-people/oss-proj.git
</code></pre></div></div>
<h4 id="step-4-make-a-feature-branch">Step 4: Make a feature branch</h4>
<p>The fourth step is to <a href="https://git-scm.com/docs/git-checkout#git-checkout-emgitcheckoutem-b-Bltnewbranchgtltstartpointgt">checkout a new feature branch</a>. This is a short lived branch with a descriptive name and it is the easiest path to submitting a PR beacuse it has several advantages.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>me@mycomputer ~/projects/oss-proj-fork (master)
$ git checkout -b my-feature-gh99 # put the issue number if there is one
</code></pre></div></div>
<ul>
<li>Your branch name serves as a quick description of the feature or issue.</li>
<li>It’s easier to sync your feature branch with master if your feature takes awhile to finish, and other features get merged upstream before you’re done.</li>
<li>If the upstream project chooses to rebase and squash your work into a single commit, your feature branch can serve as a history of the changes you made, although typically after a PR is merged, the feature branch can be deleted.</li>
</ul>
<h4 id="step-5-make-a-test">Step 5: Make a test</h4>
<p>The fifth step is to make a test using the unittest framework that the upstream repository uses. This test serves as the minimum acceptance criteria for the new feature. Testing in code development is <strong>very</strong> important. Tests ensure that a project is working as intended, and when issues arise, tells the maintainers and contributors, exactly where the problem is. Often repositories are integrated with online build and test servers called “continuous integration” or CI, that test every PR commit. These are helpful for communicating to collaborators the state of the PR.</p>
<p>There are several established unittest frameworks, including <a href="https://docs.python.org/3/library/unittest.html">Python’s own builtin Unittest module</a> however most project use either <a href="http://nose.readthedocs.io/en/latest/">nose</a> or <a href="https://docs.pytest.org/en/latest/">pytest</a>. If you can’t figure what the maintainers use, then use pytest and simple assertions. Don’t be surprised if they ask you to adopt their own specific paradigm. Be flexible. This is an opportunity for you to collaborate with the maintainer and learn something new.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from oss_proj.core.new_feature import new_calc
import numpy as np
import pandas as pd
import os
BASEDIR = os.path.dirname(__file__)
NEW_FEAT_TEST_DATA = os.path.join(
BASE_DIR, "new_feature_test_data.csv"
)
A, B, C = 1, 2, 3
KNOWN_GOOD_VALUES = pd.read_csv(NEW_FEAT_TEST_DATA)
def test_new_feature_calculation():
calculated_values = new_calc(A, B, C)
assert np.allclose(calculated_values, KNOWN_GOOD_VALUES)
</code></pre></div></div>
<p>If you run the testrunner, <code class="highlighter-rouge">pytest</code>, from the command line now, your test will fail. Don’t cry, this is OK. Failure is not bad. We will work on this until it passes, but not yet. First we have to commit our changes and push them up to our fork.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>me@mycomputer ~/projects/oss-proj-fork (my-feature-gh99)
$ git add oss-proj-fork/oss_proj/core/new_feature.py # we may need to add files first
$ git commit -m "add test for new feature to fix #99" # if there's an issue you can refer to it
$ git push -u origin my-feature-gh99
</code></pre></div></div>
<h4 id="step-6-create-a-pull-request-on-github">Step 6: Create a Pull Request on GitHub</h4>
<p><strong>Now is when you submit the PR!</strong> Not after you’ve done a bunch of work and find out that someone else already solved the problem, or that the maintainers don’t like your approach b/c it doesn’t fit into their long term plan. But <strong>NOW!</strong> as soon as you start working, so that everyone else has a chance to collaborate with you and your cool new feature.</p>
<p>To create a PR for your feature, go online to GitHub. When you view either your fork or the upstream repo, you should see a message from GitHub that asks you if you want to create a PR for your new feature. Click it, and add some descriptive information about your plans for the feature, what you intend to do, if it relates to any issues, how long it might take, what help you need, etc. Then click submit.</p>
<p><img src="https://github-images.s3.amazonaws.com/help/pull_requests/recently_pushed_branch.png" alt="submitting a pull request" title="pull request" /></p>
<p>If GitHub doesn’t automatically ask you, go to either your fork or the upstream repo and click New Pull Request. Then choose the upstream repo as the “base fork” and set the “head fork” to your feature branch.</p>
<h4 id="step-7-hack-communicate-repeat">Step 7: Hack, Communicate, Repeat</h4>
<p>Now comes the fun. Hack! Communicate via the pull request with the other contributors. Collaborate and hack some more. Finally let them know when all of your changes are complete, your tests are all passing and you’re ready for the maintainers to review and merge your new feature. This may take many iterations. <strong>Be patient!</strong> <a href="https://opensource.org/node/877">Assume Good Faith</a></p>
<h1 id="conclusion">Conclusion</h1>
<p>That’s it. There may be some nuances and differences between projects. There are some projects that want you to email patches, but that’s a subject for another discussion. Also you may be asked to add or update documentation. Often there is an <code class="highlighter-rouge">AUTHORS</code> file for contributors, feel free to add yourself or ask if you should. Also there may be a changelog that you should contribute to. Or you may choose to contribute to the wiki instead of the codebase. Communication is the key to finding out about all of these loose ends. Keep the channels open, stay positive and enjoy.</p>
<h1 id="cool-video">Cool Video</h1>
<p><a href="https://www.youtube.com/watch?v=y19s6vPpGXA">This is a cool video of Brett Cannon</a></p>
No meeting this week -- THW postponed until Oct 42017-09-27T00:00:00+00:00https://BIDS.github.io/dats/posts/nomeeting-f17Version Control with git -- Mitch Negus and Yu Feng2017-09-20T00:00:00+00:00https://BIDS.github.io/dats/posts/vc-git<p>If you don’t have git installed:</p>
<ul>
<li><a href="https://git-for-windows.github.io/">Download for Windows (includes bash & git)</a></li>
<li><a href="https://git-scm.com/download/mac">Download for Mac OS X</a></li>
<li>Linux: <code class="highlighter-rouge">sudo apt-get install git</code></li>
</ul>
<h1 id="git-this">Git this</h1>
<h2 id="what-happens-when-you-dont-use-version-control">What happens when you don’t use version control?</h2>
<h3 id="a-general-life-example">A general life example:</h3>
<p><img src="../images/fig/phd_comics_VC.png" alt="" /></p>
<p>You most likely started out doing something like this. Maybe you’ve become more sophisticated (or not) and now you</p>
<ul>
<li>date files</li>
<li>append <code class="highlighter-rouge">_vXXX</code></li>
</ul>
<p>This is good, but you can still do better.</p>
<h2 id="why-version-control-is-amazing">Why version control is amazing</h2>
<ul>
<li>Code that works will be saved permanently</li>
<li>If you break the code that works, reverting is easy</li>
<li>You still only need to keep one version around (the VC program does the rest)</li>
<li>Collaboration is kept smooth and coordinated</li>
<li>Productivity is not stifled by too many people working on one document</li>
</ul>
<p><img src="../images/fig/rock-climbing.jpg" alt="" /></p>
<h2 id="git">Git</h2>
<p><a href="https://git-scm.com/">Git</a> is the method of version control we’re going to be working with. Other methods exist (SVN and Mercurial are the big names, and Wikipedia’s <a href="https://en.wikipedia.org/w/index.php?title=List_of_version_control_software&action=history">page history</a> is a more commonly known example).</p>
<p>Git tracks your files by essentially taking a snapshot of a directory or subdirectory structure and saving it over time. The process of capturing one of these processes is called a commit.</p>
<p>To be a little more specific, you can think of Git as working in three different areas or bins. There is (1) the workspace, (2) the repository, and (3) the index.</p>
<h4 id="workspace">Workspace:</h4>
<p>The set of directories and files that you actually operate on.</p>
<h4 id="repository-repo">Repository (repo)</h4>
<p>The set of linked commits corresponding to snapshots of the workspace at specified points in time.</p>
<h4 id="index">Index</h4>
<p>The staging area where you “set up your snapshot”; files in the index that have changed since your last commit will be updated on your next commit.</p>
<h3 id="hashes">Hashes</h3>
<p>Each commit is given a unique label (called a hash) and is tied to the previous commit. This is created using a function which converts a set of information (i.e. your docs) into a string of letters and numbers. Hashes include a layer of security, since each relies on its parent’s hash.</p>
<h3 id="diagramatic-representation">Diagramatic Representation</h3>
<p>Below is a diagram of the git process for a single file; I’ve named it file_1. You start by creating the file in your workspace.</p>
<p><img src="../images/fig/git_wd01.png" alt="" /></p>
<p>Once you are satisfied with it’s progress, you decide that you want to commit your work. You add the file to the index.</p>
<p><img src="../images/fig/git_index01.png" alt="" /></p>
<p>Now you’re ready. You can commit the file to your repository. (Notice that in the repository file_1 is shown with the hash of the commit. In reality the full hash for a git commit is much longer.)</p>
<p><img src="../images/fig/git_repo01.png" alt="" /></p>
<p>Now, if you make some changes to file_1. Your workspace changes.</p>
<p><img src="../images/fig/git_wd02.png" alt="" /></p>
<p>You add the changed file_1 to the index.</p>
<p><img src="../images/fig/git_index02.png" alt="" /></p>
<p>And again, you finally you commit. The new commit replaces the most recent commit, which moves deeper into the repository’s history.</p>
<p><img src="../images/fig/git_repo02.png" alt="" /></p>
<p>And the process repeats, on and on, continually building up your repository’s history.</p>
<p><img src="../images/fig/git_repo_many.png" alt="" /></p>
<h2 id="your-turn-to-git-started">Your turn to Git started</h2>
<p>First we want to make a directory to track as the repository for this tutorial. Go somewhere in your file system and create this directory. Then navigate inside the newly created directory. (In the following code snippets, a $ indicates a command line prompt.)</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir git_tutorial
$ cd git_tutorial
</code></pre></div></div>
<p>Once you’re inside, its time to initialize the repository. Intuitively, this is done with the command</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git init
</code></pre></div></div>
<p>Now, since this is likely the first time you’re creating a Git repo, you may want to set up some Git configurations. Feel free to skip this step (it is optional) but if you don’t do it now, Git will likely ask you for this information repeatedly in the future.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git config --global user.name "OskiBear"
$ git config --global user.email "oski@berkeley.edu"
$ git config --global core.editor /usr/bin/nano
</code></pre></div></div>
<p>Don’t worry, you can change this information later. It is stored in the hidden <code class="highlighter-rouge">.git/config</code> file that appears in whichever directory you ran <code class="highlighter-rouge">git init</code>. You can also use <code class="highlighter-rouge">git config --list</code> to print all of the configuration options for easy viewing.</p>
<p>If you already have a GitHub account, it is best to use the same username and email for both Git and that account. Additionally, if you have a preffered editor (vim, emacs, sublime, etc.), feel free to use that instead of nano as your default.</p>
<p>First off, we’ll make a very simple program. Open a new file called <code class="highlighter-rouge">hello.sh</code> and add the following inside:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>echo "Hello, World!" > hello.txt
</code></pre></div></div>
<p>Save the script, and exit the text editor.</p>
<p>We can now see the options that Git provides by using just the <code class="highlighter-rouge">git</code> command, with no options or arguments. It should show the git usage statement, providing descriptions of the most commonly used git commands.</p>
<p>At this point we will follow the steps outlined above. First, we will add our new file to the index, as we prepare to save a snapshot of it to our repository. This is straightforward enough, the command is</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git add hello.sh
</code></pre></div></div>
<p>If all goes well, nothing should be displayed on the console. If you want to check that your change was added to the index, type</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git status
</code></pre></div></div>
<p>Git should then let you know that a new file, <code class="highlighter-rouge">hello.sh</code> has been staged, and is ready to be committed. When you are ready, commit the changes with the <code class="highlighter-rouge">git commit</code> command. Adding the <code class="highlighter-rouge">-m</code> option allows you to give a short message explaining the changes.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git commit -m "created a new repository containing a simple script"
</code></pre></div></div>
<p>Let’s keep practicing. Now make another slightly more interesting program. This is a python script to use Monte Carlo rejection sampling to determine the value of $\pi$. Open a file called picalc.py and include the following code inside.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import numpy as np
N = 1000
X = np.random.random(N)
Y = np.random.random(N)
scores = []
noscores = []
for n in range(N):
x,y = X[n],Y[n]
if x**2 + y**2 < 1:
scores.append([x,y])
else:
noscores.append([x,y])
if n%10 == 0:
print(4*len(scores)/(len(scores)+len(noscores)))
</code></pre></div></div>
<p>Again, save and exit the editor, and now follow the same steps as before. First, add the new file; then, commit the changes and include a brief message.</p>
<h2 id="lets-git-a-bit-more-complicated">Let’s Git a bit more complicated</h2>
<p>At this point, our repository is starting to get interesting. We can see how things are evolving in our repo with the command</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log
</code></pre></div></div>
<p>You will see a history of each commit, from the most recent commit at the top to the oldest commit at the bottom.</p>
<p>Open picalc.py in an editor once again and change the script by adding the following lines:</p>
<ul>
<li>
<p>Below <code class="highlighter-rouge">import numpy as np</code> add</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> from matplotlib import pyplot as plt
</code></pre></div> </div>
</li>
<li>
<p>At the bottom of the program, add (unindented)</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> scores = np.asarray(scores)
noscores = np.asarray(noscores)
plt.plot(scores[:,0], scores[:,1], 'bo', noscores[:,0], noscores[:,1], 'ro')
plt.show()
</code></pre></div> </div>
</li>
</ul>
<p>Save the script and exit the editor. Now, the version of picalc.py in our working directory is different than the version in the index (and by extension, in the repo). We can see the changes quickly by typing</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git diff
</code></pre></div></div>
<p>on the command line.</p>
<p>Now, we’re going to take things even further. The really powerful parts of Git are used when two people are collaborating on a project (or when you are trying to multitask–working on two aspects of a single project independently from one another).</p>
<p>To see this, we are going to have you “break” one of your programs. To solve the issue, you are going to create a new branch where you go back and correct the problem. Now, the branch with the fix will be different from the original. When you try to bring the branches back together, to the original “master” branch, the changes will conflict, and you will be able to merge the changes. (If the changes didn’t conflict, for example if you just added an extra line of code, Git is smart enough to notice this fact and merge the documents automatically.</p>
<p><Create a diagram to show/explain the following procedure></p>
<p>Open the <code class="highlighter-rouge">hello.sh</code> script and change the echo statement to be more like the original “hello, world” scripts</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>echo `hello, world` > hello.txt
</code></pre></div></div>
<p>Since we haven’t created a new file here, we can use a handy shortcut to avoid having to type both <code class="highlighter-rouge">git add</code> and <code class="highlighter-rouge">git commit</code>. Type</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git commit -am "message of your choice"
</code></pre></div></div>
<p>The new option <code class="highlighter-rouge">-a</code> adds all modified documents to the index automatically before the commit is enacted.</p>
<p>Now, if you run the program again, the file <code class="highlighter-rouge">hello.txt</code> now contains the statement in all lowercase.</p>
<p>Since this is improper English, we decide to change the shell script back to the way it was before. This time though, we’re going to work on a new branch. Creating a new branch allows us to make changes to the code, including as many commits as we’d like, without actually modifying the original “master branch” of the code. Let’s try it.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git checkout -b englishfix
</code></pre></div></div>
<p>The git checkout statement allows you to extract files and branches from the repository. Since our branch had not already been created, we needed to use the <code class="highlighter-rouge">-b</code> option. If the englishfix branch already existed, we could omit the <code class="highlighter-rouge">-b</code>.</p>
<p>Now we are on the <code class="highlighter-rouge">englishfix</code> branch. Let’s go ahead and fix the script, changing <code class="highlighter-rouge">hello, world</code> to <code class="highlighter-rouge">Hello, World</code>. Commit the changes. Now, if you use <code class="highlighter-rouge">git checkout master</code> (master is automatically named when you use <code class="highlighter-rouge">git init</code>) and look at the script (type <code class="highlighter-rouge">cat hello.txt</code>) you will see that your changes on the englishfix branch were not transferred. This is super useful if you have a working version of a piece of code and want to add a new feature without taking the risk of breaking your code in the meantime.</p>
<p>What if you had been working on the script in the meantime? Still on the master branch, open the script, and add the exclamation point back in. Change “hello, world” to “hello, world!”, save and exit the editor.</p>
<p>Now, since we <em>did</em> want to incorporate the changes on the englishfix branch into the master branch, we should merge the englishfix branch into the master branch. Type</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git merge englishfix
</code></pre></div></div>
<p>You should get an error (which includes a culprit file). Even though git is usually smart enough to perform automatic merges, when a line of code is edited two different ways on two different branches, it doesn’t know what to make of the situation. This best solution, give it back to a human and let them make some sense out of it.</p>
<p>To resolve the conflict, use <code class="highlighter-rouge">git diff</code> to have the differences between the conflicting documents printed to the console, and then use an editor to fix the discrepancy. Finally, add and commit the changes, and then the merge is complete!</p>
<p>Find image of Git repo as tree to emphasize branched nature</p>
<h2 id="bonus">Bonus!</h2>
<h3 id="configurations">Configurations</h3>
<p>As we mentioned earlier, you can manually edit the configuration file to update your Git settings. To do this, move to your home directory and open the file <code class="highlighter-rouge">.gitconfig</code>.</p>
<h3 id="gitignore">.gitignore</h3>
<p>You can also tell git to ignore specific files by adding them to a .gitignore file. Find (or create) this file as <code class="highlighter-rouge">.gitignore</code> in the top level of the repo where you want it to apply.</p>
<p>Note you can use wildcards in these filenames! (i.e. <code class="highlighter-rouge">*.log</code> will ignore all files ending in <code class="highlighter-rouge">.log</code>)</p>
<p>Other things:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout
git stash
git reset
git rebase
gitk
</code></pre></div></div>
<h2 id="git-cheat-sheet">Git Cheat Sheet</h2>
<p>You can often think of the operations that Git performs on the three “areas”–workspace, index, and repo–as mathematical equations. Here are some examples (for each Git command, perform the steps in order):</p>
<h5 id="git-add">git add</h5>
<p>$ \text{staged index} = \text{workspace} - \text{current branch} $</p>
<h5 id="git-commit">git commit</h5>
<p>(with <code class="highlighter-rouge">-a</code> option: $ \text{staged index} = \text{workspace} - \text{current branch} $)
$ \text{new commit} = \text{staged index} $
$ \text{current branch} = \text{new commit} $
$ \text{staged index} = 0 $</p>
<h5 id="git-checkout">git checkout</h5>
<p>$ \text{new workspace} = \left(\text{workspace} - \text{old branch}\right) + \text{new branch} $</p>
<h5 id="git-stash">git stash</h5>
<p>$ \text{stash} = \text{workspace} - \text{current branch} $
$ \text{workspace} = \text{current branch} $</p>
<h5 id="git-reset-hard">git reset –hard</h5>
<p>$ \text{workspace} = \text{current branch} $</p>
Install Party -- Aaron Culich and Stuart Geiger2017-09-13T00:00:00+00:00https://BIDS.github.io/dats/posts/install-party<h1 id="the-hacker-within-install-party">The Hacker Within Install Party!</h1>
<p>Installing all the things is always a pain, so why don’t we try and get as much as we can out of the way all at once? So for next week’s The Hacker Within (Sept 13th), we will be having an install party, where we will all try and help each other get various kinds of programming languages, libraries, development tools, and package environments installed. Come if you need things installed or can help others install things.</p>
<h2 id="the-plan">The plan</h2>
<p>Your session leaders (Aaron and Stuart) are still working out how we ought to organize the install party, but we think this might work best as a series of lightning talks that could split into different groups and one-on-ones as appropriate, rather than a single linear session where everyone does the same thing. I imagine that we’ll be spending some time trying to debug each other’s environments when the official instructions don’t work. :) So we’re gathering info about what people want to install, where they want to install it, and who can help with what.</p>
<h3 id="click-here-to-add-your-install-environment-requests-and-sign-up-to-leadhelp"><a href="https://docs.google.com/document/d/15UieD0_hTbr5obBJsLrctwhi8amX8QzSeE5ZLxsHbuY/edit?usp=sharing">Click here to add your install environment, requests, and sign up to lead/help</a></h3>
First Meeting -- What do we want to learn and teach?2017-09-06T00:00:00+00:00https://BIDS.github.io/dats/posts/intro-f16<p>We usually have a “what to learn and teach” session the first week of the semester. This is a nice time for us to get together, do a round of introductions, see what we want to learn and teach, and then try to set as much of the schedule for the semester as we can. I’ll also be sharing results from the topics survey, which are <a href="https://github.com/thehackerwithin/berkeley/blob/master/code_examples/survey_f17/survey.ipynb">here in this Jupyter notebook</a>.</p>
<p>Google doc for taking notes <a href="https://docs.google.com/document/d/1OrRQOFhoBZy8BCmCuPU2wLV2lmuGFcU0ojS8dQ86HJQ/edit">here</a></p>
No THW -- BIDS Data Science Faire2017-05-02T00:00:00+00:00https://BIDS.github.io/dats/posts/ds-faire-sp17<p>BIDS Data Science Faire: https://bids.berkeley.edu/events/bids-spring-2017-data-science-faire</p>
Mapping and geospatial data -- Brian Hamlin2017-04-25T00:00:00+00:00https://BIDS.github.io/dats/posts/sp17-mappingVisualization in Python -- David DeTomaso2017-04-18T00:00:00+00:00https://BIDS.github.io/dats/posts/python-viz-sp17<p>Link to the repo containing the presentation notebook:</p>
<ul>
<li><a href="https://github.com/deto/THW_Python_Plotting">Repo Link</a></li>
</ul>
<p>Clone the repo to follow along and open up the “Plotting in Python.ipynb” notebook.</p>
Visualization in R -- Diya Das2017-04-11T00:00:00+00:00https://BIDS.github.io/dats/posts/r-viz-sp17<p>Please clone the repo at <a href="https://github.com/diyadas/tutorials">https://github.com/diyadas/tutorials</a></p>
Containers with Docker -- Tony Kelman2017-04-04T00:00:00+00:00https://BIDS.github.io/dats/posts/containers-docker<p>Tony is going to be using <a href="https://dply.co">dply.co</a> to walk us through containers, which lets you set up a free cloud server for 2 hours. If you want to walk it on your own laptop, you need to have a github account, an SSH key, and link the SSH key to your github account (see <a href="https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/">this help page</a> for instructions on that).</p>
<p>If you can create a server on dply.co and connect to it with the SSH key, then you’ll be good to go. If not, come a few minutes early and we can help you get set up.</p>
<p>Talk slides are available <a href="https://1drv.ms/p/s!Ak4iGlIIdzDHimHLxa15FHjuOnO7">here</a>.</p>
Spring Break -- no meeting2017-03-28T00:00:00+00:00https://BIDS.github.io/dats/posts/springbreak17Neural Networks using Transfer Learning with Caffe -- Maryana Alegro2017-03-21T00:00:00+00:00https://BIDS.github.io/dats/posts/caffe<h1 id="overview">Overview</h1>
<p>Repository for a tutorial at <a href="http://www.thehackerwithin.org/berkeley/">THW, Berkeley</a> on <a href="http://caffe.berkeleyvision.org/">Caffe</a>.</p>
<h1 id="running-the-tutorial">Running the tutorial</h1>
<p>You can run the tutorial Jupyter notebooks:</p>
<ul>
<li><strong>locally</strong> on your computer: The easiest way is running Caffe Docker image. After installing <a href="https://www.docker.com/">Docker</a>
type
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">-ti</span> <span class="nt">-p</span> 8888:8888 bvlc/caffe:cpu
</code></pre></div> </div>
</li>
</ul>
<p>Inside the container, install jupyter</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install jupyter
</code></pre></div></div>
<p>Clone this repository and start Jupyter typing</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/mary-alegro/caffe_tutorial_thw
<span class="nb">cd </span>caffe_tutorial_thw
jupyter notebook <span class="nt">--ip</span> 0.0.0.0
</code></pre></div></div>
<p>Copy and paste the URL Jupyter outputs in your browser. You should now be able to access the notebook running inside the container.</p>
Data tidying in R & Python -- Diya Das and David Detomaso2017-03-14T00:00:00+00:00https://BIDS.github.io/dats/posts/data-tidying-r-python<p>For this tutorial, clone the github repo at <a href="https://github.com/diyadas/tutorials">https://github.com/diyadas/tutorials</a></p>
Documentation and Continuous Integration in Python with Sphinx and Travis CI -- Nelle Varoquaux, Chris Holdgraf, Matthias Bussonnier2017-03-07T00:00:00+00:00https://BIDS.github.io/dats/posts/documentation<h1 id="documentation-and-travis">Documentation and Travis</h1>
<p>Welcome to this special session the The Hacker Within Berkeley which will take
place at the usual BIDS location but during the
<a href="https://bids.github.io/docathon">Docathon</a> event that span the week of March 6
to 10.</p>
<p>During the Talks on Monday 6th, you had a quick overview of Sphinx, RMarkdown,
and how <a href="https://travis-ci.org">Travis-Ci</a> can be used to deploy documentation.</p>
<p>Today we’ll get our hands dirty and try to deploy this ourself using GitHub,
Travis, and GhPages on our own, as well as describe what to do (and not to do)
when doing so.</p>
<h2 id="requirements">Requirements</h2>
<p>The requirements are minimal and the time of the Hacker Within session should be
enough to get them, though, getting these in advance will help to follow along.</p>
<ul>
<li>get a GitHub account</li>
<li>Login on Travis-CI with your GitHub</li>
</ul>
<p>If possible:</p>
<ul>
<li>install the <code class="highlighter-rouge">travis</code> ruby gem on your machine (<code class="highlighter-rouge">$ gem install travis</code> should
be enough)</li>
<li>have <a href="https://github.com/drdoctr/doctr">doctr</a> installed on your local
machine.</li>
</ul>
<h2 id="high-level-overview">High level overview</h2>
<p>Understanding how to deploy documentation from Travis requires a minimal
understanding on how Travis works.</p>
<p>In particular we will discuss the safe ways to store credentials in the
<code class="highlighter-rouge">.travis.yml</code> file, what do to, not to do, when these credential get decrypted
and when they are not.</p>
<p>We’ll setup a repository that deploy itself on GitHub pages when pushed on
master.</p>
Visualization with D3.js -- Caroline Cypranowska and Luc Guillemot2017-02-28T00:00:00+00:00https://BIDS.github.io/dats/posts/d3js-sp17<h1 id="d3_fretgraph">d3_fretgraph</h1>
<p>D3 tutorial for building an animated line graph (with real FRET data) for The Hacker Within at UC Berkeley on February 28, 2017.</p>
<h1 id="intro-to-d3">Intro to D3</h1>
<p>Luc’s slides on the fundamentals of D3 (with code examples) are posted <a href="https://docs.google.com/presentation/d/1HUKaUgAiZXTKibrGXXQX3lb7G15o2kvm-EHrpeXCeCg/edit?usp=sharing">here</a>.</p>
<h2 id="how-to-prepare-for-this-tutorial">How to prepare for this tutorial</h2>
<ol>
<li><a href="http://brackets.io/">Download and install Brackets</a>
<ul>
<li>(This is Caroline’s preferred tool for building visualizations with D3, but isn’t strictly necessary. It has a nice live preview feature that is handy if you’re building these visualizations to go on a webpage.)</li>
</ul>
</li>
<li>Fork (or download) <a href="https://github.com/cypranowska/d3_fretgraph">Caroline’s d3_fretgraph</a> repository
<ul>
<li>It has a template in the main directory that we’ll use to write our code, our raw data in a .csv file in the /data directory, a minified version of D3 in the /d3 directory, and a finished version of the visualization in the /finished_version directory</li>
</ul>
</li>
<li>For Luc’s code example–navigate to <a href="http://lucguillemot.com/d3-hackerwithin-example/">this</a> webpage and open developer tools. Click on the ‘sources’ tab to grab the contents of the ‘d3-hackerwithin’ directory.</li>
</ol>
<h2 id="what-is-d3">What is D3?</h2>
<p>D3 stands for data-driven documents, and is a JavaScript library for building interactive data visualizations to display on the web. It was developed primarily by Mike Bostock, his PhD adviser, Jeffrey Heer, and Vadim Ogievetsky (<a href="http://vis.stanford.edu/files/2011-D3-InfoVis.pdf">Bostock, Ogievetsky & Heer, IEEE Trans. Visualization & Comp. Grapics, 2011</a>).</p>
<p>D3 is notoriously challenging because it requires knowing a bit about JavaScript, a bit about HTML/CSS, and a bit about SVG. The goal with this workshop is to help you get a good enough sense of how D3 works to explore on your own.</p>
<h2 id="d3-visualizations-are-built-around-binding-data-to-html-or-svg-elements">D3 visualizations are built around binding data to HTML or SVG elements</h2>
<p>What the heck does binding even mean? The idea here is that if you have a bunch of data and you want to use those data to manipulate elements on your webpage, then you need a way to select those elements and associate (or ‘bind’) your data to them.</p>
<p>Here’s an example of how to do this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">sample</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">];</span>
<span class="nx">d3</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="s1">'body'</span><span class="p">).</span><span class="nx">selectAll</span><span class="p">(</span><span class="s1">'p'</span><span class="p">)</span> <span class="c1">// this selects all paragraph elements within the body of your HTML file, if you don't have any <p> elements on your page then this is a virtual selection</span>
<span class="p">.</span><span class="nx">data</span><span class="p">(</span><span class="nx">sample</span><span class="p">)</span> <span class="c1">// this binds your data variable to your selection</span>
<span class="p">.</span><span class="nx">enter</span><span class="p">()</span> <span class="c1">// THIS is the magic of D3! This method allows you to create NEW elements on the webpage based on your data</span>
<span class="p">.</span><span class="nx">append</span><span class="p">(</span><span class="s1">'p'</span><span class="p">)</span> <span class="c1">// for each datum in your variable, D3 will append a new <p> element to your page</span>
<span class="p">.</span><span class="nx">text</span><span class="p">(</span><span class="s2">"I'm a paragraph!"</span><span class="p">);</span> <span class="c1">// the text in each newly created <p> element</span>
</code></pre></div></div>
<p>If you were to put this code between <code class="highlighter-rouge"><script></code> tags on an HTML document and then view on a browser, you would see a page with 4 <code class="highlighter-rouge"><p></code> elements with ‘I’m a paragraph!’ in them. But if you were to open your web inspector and run <code class="highlighter-rouge">console.log(d3.selectAll("p"))</code> you will see that each element has a <code class="highlighter-rouge">__data__</code> parameter, and that value will correspond to the value in <code class="highlighter-rouge">sample</code>.</p>
<p>The way you then manipulate elements on your HTML document is by writing functions that take those data as arguments and change some kind of attribute of the selected element.</p>
<h2 id="showing-things-to-scale">Showing things to scale</h2>
<p>One of the other important D3 concepts is scale. For example, if you wanted to draw a circle on your document representing the US GDP ($18.56 trillion), you wouldn’t want a circle that has a diameter of 18.56 trillion pixels. D3’s <code class="highlighter-rouge">.scale</code> method helps you scale your data to the size of the graphic that you want to create. We’ll discuss this more when we build our example.</p>
<h1 id="you-dont-need-to-reinvent-the-wheel">You don’t need to reinvent the wheel</h1>
<p>There are tons of resources for learning D3 and perusing through code blocks created by other people.</p>
<h2 id="online-learning-resources">Online learning resources</h2>
<ul>
<li><a href="https://github.com/d3/d3/wiki/Tutorials">D3 documentation</a></li>
<li><a href="http://alignedleft.com/tutorials/d3">Aligned Left</a></li>
<li><a href="https://www.dashingd3js.com/">Dashing D3</a> – not all content on this site is free</li>
</ul>
<h2 id="example-galleries">Example galleries</h2>
<ul>
<li><a href="https://github.com/d3/d3/wiki/Gallery">Official D3 Gallery</a></li>
<li>https://bl.ocks.org/</li>
<li>http://christopheviau.com/d3list/gallery.html</li>
</ul>
<h2 id="fancy-examples">Fancy examples</h2>
<ul>
<li>http://www.facesoffracking.org/data-visualization/</li>
<li>http://www.koalastothemax.com/</li>
</ul>
Git and GitHub -- Ciera Martinez and Matthias Bussonnier2017-02-21T00:00:00+00:00https://BIDS.github.io/dats/posts/git-github<h1 id="git-and-github">Git and Github</h1>
<h2 id="introduction-to-git-and-github">Introduction to Git and GitHub</h2>
<p>Wether you are lost in the woods trying to save a bear cub stuck in a tree, or
defending earth against alien invasion, <a href="https://git-scm.com/">git</a> is a tool of
choice to collaborate and save your progress to come <a href="http://www.mattluedke.com/back-git-history/">back in time</a> and save the
day again if needed.</p>
<p>Though using git (and GitHub) can be quite intimidating or look like dark magic.
We will gently introduce you to simple git concept, from <a href="https://xkcd.com/1597/">Just memorize these shell commands</a>,
to some dark voodoo allowing you to do a <a href="http://marc.info/?l=linux-kernel&m=139033182525831">66 way Cthulhu merge</a>.</p>
<p>We will learn wether or not Linus Torvald (Git Creator) actually said the
following statemnts <a href="http://typicalprogrammer.com/linus-torvalds-goes-off-on-linux-and-git/">or not</a>, and wether the following statement have
a bit of truth in them:</p>
<blockquote>
<p>“all meaningful operations can be expressed in terms of the rebase command”</p>
</blockquote>
<blockquote>
<p>[git is] so hard to use, but that turns out to be its big appeal</p>
</blockquote>
<p>It is true that <a href="https://git-scm.com/docs/git-commit">actual manual page</a>, can be hard to distinguish from <a href="https://git-man-page-generator.lokaltog.net/">markov-chain text</a>, but you probably don’t need to dive into it now.</p>
<h2 id="what-well-do">What we’ll do</h2>
<h3 id="the-basics">The basics</h3>
<p>We’ll start pretty soft. Make sure you have git installed, and that it works.</p>
<p>We’ll make sure you know the basics to already use git on your own, and to be ready to collaborate.</p>
<ul>
<li>Clone a repository</li>
<li>Fork a GitHub repository</li>
<li>Create a repository from scratch</li>
<li>Make a commit</li>
<li>Make a branch</li>
<li>Create a Pull request on GitHub</li>
<li>Update your local repository</li>
</ul>
<h2 id="what-is-the-difference-between-github-and-git">What is the difference between github and git?</h2>
<h3 id="git">Git</h3>
<p>A lightweight version control system to track changes made to a project through time. There are many ways to use Git on your computer.</p>
<p>The main ways are:</p>
<ul>
<li><a href="https://git-scm.com/book/id/v2/Getting-Started-The-Command-Line">Command Line</a> - typing command into terminal (mac)</li>
<li><a href="https://desktop.github.com/">GitHub desktop</a> - GUI</li>
<li><a href="https://jennybc.github.io/2014-05-12-ubc/ubc-r/session03_git.html">In RStudio</a> - GUI</li>
<li><a href="https://www.sourcetreeapp.com/">SourceTree</a> - GUI</li>
</ul>
<p><strong>Suggestion</strong>: Command Line</p>
<p>Command line is the most popular way to use git, therefore you can get help easily. If you know how to run the command line version, you can probably also figure out how to run the GUI version, while the opposite is not necessarily true. Dont let your inexperience with command line stop you, you only need to learn the very basics of unix to use git.</p>
<h3 id="github----remote-hosting">Github <i class="fa fa-github" aria-hidden="true"></i> - Remote Hosting</h3>
<p>While Git stands alone as a system, Github is a website that hosts your project and Git history. You can use for collaboration, back-up, sharing, and learning. <a href="https://github.com/">Github</a> is just one of many places to host repositories.</p>
<p>The main ways are:</p>
<ul>
<li><a href="https://bitbucket.org/">Bitbucket</a></li>
<li><a href="https://gitlab.com/users/sign_in">GitLab</a></li>
<li><a href="https://sourceforge.net/">sourceForce</a></li>
</ul>
<p><strong>Suggestion</strong>: Github</p>
<p>The benefit of Github is that it is the most popular and has many tools to make it easy and fun to use. The main downside is that it does not allow free private repositories.</p>
<h2 id="why-use-git">Why use Git?</h2>
<center><img src="http://www.phdcomics.com/comics/archive/phd101212s.gif" width="50%" height="50%" /></center>
<ul>
<li>Allows you to store versions (properly)</li>
<li>Makes you fearless</li>
<li>Restoring Previous Versions</li>
<li>Collaboration <i class="fa fa-github" aria-hidden="true"></i> - Git allows groups of people to work on the same documents (often code) at the same time, and without stepping on each other’s toes (from <a href="https://try.github.io/levels/1/challenges/1">tryGit</a>).</li>
<li>Backup <i class="fa fa-github" aria-hidden="true"></i></li>
<li>Build easy to maintain websites <i class="fa fa-github" aria-hidden="true"></i></li>
</ul>
<h2 id="learning-git">Learning Git</h2>
<p>Learning git <em>well</em> is hard, but I would say only 5% of people who use git know <em>exactly</em> what they are doing.</p>
<center><img src="https://imgs.xkcd.com/comics/git_2x.png" width="50%" height="50%" /></center>
<h3 id="why-is-learning-git-hard">Why is learning git hard?</h3>
<ul>
<li>Vocabulary is not intuitive and is different depending on the system to use it. Here is a <a href="https://help.github.com/articles/github-glossary/">cheatsheet for common vocabulary</a></li>
<li>Git is a complex with many ways to approach using it.</li>
<li>Git becomes more complex when working on a team, because there must be rules for how to collaborate and these rules differ depending on the team. You can learn how a team collaborates usually from a file in the project directory called <code class="highlighter-rouge">CONTRIBUTING.md</code>. Example contributing file: <a href="https://github.com/tidyverse/ggplot2/blob/master/CONTRIBUTING.md"><code class="highlighter-rouge">CONTRIBUTING.md</code> file for ggplot2</a></li>
</ul>
<h2 id="demo-beginner">Demo (Beginner)</h2>
<h3 id="requirements">Requirements</h3>
<h3 id="git-1">Git</h3>
<p>Try to have git installed on your laptop before coming to the hacker within.
If you are on windows we recommend git-bash, which should be bundled with <a href="https://desktop.github.com/">GitHub for Desktop</a>.</p>
<p>Git should be bundled on recent Macs, you can also install it with <a href="https://desktop.github.com/">GitHub for Desktop</a>, or <a href="https://brew.sh">Homebrew</a>.</p>
<p>User of linux probably already have git installed as well , or know how to install it with your favorite package manager.</p>
<h3 id="activity">Activity</h3>
<p>Basically we are all going to make a small edit to a file in a repository using basic git commands. Here is an overview with many of the command we will use:</p>
<center><img src="http://cierareports.org/downloads/gitCheatSheetGitHub_ForkEasy.png" width="75%" height="75%" /></center>
<ol>
<li>Go here:
<a href="https://github.com/iamciera/THW_attendence">https://github.com/iamciera/THW_attendence</a></li>
<li>Press the Fork button (<a href="https://github.com/signup">you’ll need a Github account</a>)</li>
<li>In your terminal, execute <code class="highlighter-rouge">git clone https://github.com/YOURUSERNAME/THW_attendence</code>. Make sure you replace “YOURUSERNAME” with your Github name. For example mine is iamciera.</li>
<li>Enter the new directory with <code class="highlighter-rouge">cd THW_attendence</code></li>
<li>Add the original remote repo with <code class="highlighter-rouge">git remote add upstream https://github.com/iamciera/THW_attendence</code></li>
<li>Fetch information about the remote with <code class="highlighter-rouge">git fetch upstream</code></li>
<li>Now, you need to check what branch you’re in <code class="highlighter-rouge">git branch</code>. Make sure you are on the master branch.</li>
<li>Now we are ready to edit the file. Open the <code class="highlighter-rouge">README.md</code> file and add your name to the list. Add under the header of the letter your first name starts with. This is so we avoid merge conflicts.</li>
<li>Commit them. <code class="highlighter-rouge">git commit -am "I added files for the tutorial on my
topic.."</code> NOTE: <em><code class="highlighter-rouge">-am</code> means you are telling git to “stage all changes in the directory” and that you want to include a commit message</em></li>
<li>Git push to your origin (your repo on Github) with <code class="highlighter-rouge">git push origin master</code></li>
<li>Navigate in your browser to: https://github.com/YOURUSERNAME/THW_attendence and press the pull request button.</li>
</ol>
<h2 id="demo-advanced">Demo (Advanced)</h2>
<h3 id="advanced-tactics-">Advanced tactics !</h3>
<p>Narrow down a bug ? Let’s bisect. Want to hide your mistakes ? rebase/amend.
Have erased a mistake from history that was not a mistake ? reflog to the
rescue.</p>
<h3 id="blips-and-chitz-">Blips and Chitz !</h3>
<p>Git is no fun without all the configuration option and tricks that make your life easier.</p>
<p>Checkout a PR by it’s number ? oowee!
Diff words instead of lines ? Can doooo !
Local and global gitignore ? Sure !</p>
<h3 id="dont-panic">DON’T PANIC</h3>
<p>Even if it looks insanely complicated to operate and and partly to keep intergalactic travelers from panicking we’ll discuss what to do when things go south.</p>
<p>Long story short, keep calm and <code class="highlighter-rouge">commit -A</code> (and <code class="highlighter-rouge">push</code>) if you are really scared. Nothing is ever lost.</p>
<p>What happen in case of broken whatever ? If you are in “Detached head state”, “merge conflict”, or anything else ? We got you covered !</p>
<h2 id="resources">Resources</h2>
<h3 id="examples-of-how-i-use-github">Examples of how I use Github</h3>
<ul>
<li><a href="https://iamciera.github.io/SOMexample/">SOM Tutorial</a>: To host tutorials</li>
<li><a href="https://github.com/iamciera/creports">My Website</a>: To host website</li>
<li><a href="https://github.com/iamciera/sister-of-pin1-material">Example Manuscript Repo</a>: Host code for my papers</li>
<li><a href="http://ropensci.github.io/reproducibility-guide/">http://ropensci.github.io/reproducibility-guide/</a>: Build things with strangers</li>
<li><a href="https://github.com/meisenlab">Eisen Lab Github</a>: Collaborate with lab members.</li>
</ul>
<h3 id="learning-git-1">Learning Git</h3>
<ul>
<li><a href="https://swcarpentry.github.io/git-novice/">Software Carpentry Version Control lesson</a></li>
<li>You can <a href="try.github.com">train</a> in your browser !</li>
<li>Spoon-Knife : https://github.com/octocat/Spoon-Knife</li>
</ul>
<h3 id="adventure-time-prompt">Adventure time prompt</h3>
<p>Inspired from <a href="http://stackoverflow.com/questions/4133904/ps1-line-with-git-current-branch-and-colors">stackoverflow</a></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">function </span>we_are_in_git_work_tree <span class="o">{</span>
git rev-parse <span class="nt">--is-inside-work-tree</span> &> /dev/null
<span class="o">}</span>
<span class="k">function </span>parse_git_branch <span class="o">{</span>
<span class="k">if </span>we_are_in_git_work_tree
<span class="k">then
</span><span class="nb">local </span><span class="nv">BR</span><span class="o">=</span><span class="k">$(</span>git rev-parse <span class="nt">--symbolic-full-name</span> <span class="nt">--abbrev-ref</span> HEAD 2> /dev/null<span class="k">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$BR</span><span class="s2">"</span> <span class="o">==</span> HEAD <span class="o">]</span>
<span class="k">then
</span><span class="nb">local </span><span class="nv">NM</span><span class="o">=</span><span class="k">$(</span>git name-rev <span class="nt">--name-only</span> HEAD 2> /dev/null<span class="k">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$NM</span><span class="s2">"</span> <span class="o">!=</span> undefined <span class="o">]</span>
<span class="k">then </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="s2">"@</span><span class="nv">$NM</span><span class="s2">"</span>
<span class="k">else </span>git rev-parse <span class="nt">--short</span> HEAD 2> /dev/null
<span class="k">fi
else
</span><span class="nb">echo</span> <span class="nt">-n</span> <span class="nv">$BR</span>
<span class="k">fi
fi</span>
<span class="o">}</span>
<span class="k">function </span>parse_git_status <span class="o">{</span>
<span class="k">if </span>we_are_in_git_work_tree
<span class="k">then
</span><span class="nb">local </span><span class="nv">ST</span><span class="o">=</span><span class="k">$(</span>git status <span class="nt">--short</span> 2> /dev/null<span class="k">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$ST</span><span class="s2">"</span> <span class="o">]</span>
<span class="k">then </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="s2">"| (• ︵•)| (❍ᴥ❍ʋ) "</span>
<span class="k">else </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="s2">"| (• ‿ •)| (❍ᴥ❍ʋ)"</span>
<span class="k">fi
fi</span>
<span class="o">}</span>
<span class="k">function </span>pwd_depth_limit_2 <span class="o">{</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$PWD</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">"</span> <span class="o">]</span>
<span class="k">then </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="s2">"~"</span>
<span class="k">else </span><span class="nb">pwd</span> | sed <span class="nt">-e</span> <span class="s2">"s|.*/</span><span class="se">\(</span><span class="s2">.*/.*</span><span class="se">\)</span><span class="s2">|</span><span class="se">\1</span><span class="s2">|"</span>
<span class="k">fi</span>
<span class="o">}</span>
<span class="nb">export </span><span class="nv">PS1</span><span class="o">=</span><span class="s2">"</span><span class="se">\[\0</span><span class="s2">33[32m</span><span class="se">\]\w\[\0</span><span class="s2">33[33m</span><span class="se">\]\$</span><span class="s2">(parse_git_status)</span><span class="se">\[\0</span><span class="s2">33[00m</span><span class="se">\]</span><span class="s2"> </span><span class="nv">$ </span><span class="s2">"</span>
</code></pre></div></div>
Machine Learning with Neural Networks using Keras -- Remi Lehe2017-02-14T00:00:00+00:00https://BIDS.github.io/dats/posts/keras<p><a href="http://keras.io">Keras</a> is a machine learning library that runs on top of the popular TensorFlow neural network library.</p>
<p><a href="http://mybinder.org:/repo/remilehe/thw_keras_introduction"><img src="http://mybinder.org/badge.svg" alt="Binder" /></a></p>
<h1 id="overview">Overview</h1>
<p>Repository for a tutorial at <a href="http://www.thehackerwithin.org/berkeley/">THW, Berkeley</a> on <a href="http://keras.io/">Keras</a>.</p>
<h1 id="running-the-tutorial">Running the tutorial</h1>
<p>The tutorial is in the form of Jupyter notebooks. You can run these notebooks:</p>
<ul>
<li><strong>remotely</strong> on <a href="http://mybinder.org/">mybinder.org</a>: to do so, click the above badge (although binder is temporarily down right now)</li>
<li><strong>locally</strong> on your computer. To do so, install <a href="https://www.continuum.io/downloads">Anaconda</a> and install the requirements by typing</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>conda install -c conda-forge jupyter keras pandas matplotlib
</code></pre></div></div>
<p>Then, clone this repository, and run the jupyter notebook:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/RemiLehe/thw_keras_introduction.git
cd thw_keras_introduction
jupyter notebook index.ipynb
</code></pre></div></div>
Intro to Python -- Yu Feng and Stuart Geiger2017-02-07T00:00:00+00:00https://BIDS.github.io/dats/posts/intro-python-sp17<h1 id="intro-to-python-and-anacondajupyter">Intro to Python (and anaconda/Jupyter)</h1>
<p>This session will be an intro to python. We will also be using and helping set up Jupyter notebooks, which is a programming environment we frequently use for THW sessions, as well as anaconda, which is a package manager that will install python, Jupyter, and many other libraries and dependencies for you.</p>
<p>If you don’t have these libraries installed, follow the instructions <a href="https://bids.github.io/2016-01-14-berkeley/#python">here</a> – you know you have it set up right if you can type “jupyter notebook” into a terminal / command prompt and the browser-based Jupyter interface pops up.</p>
<p>Note: this will make the anaconda version of python (which is python 3.6) your default python. If you already have a non-anaconda version of python installed and you are using this for important work, it may be best to create a new user account and install anaconda under that (selecting the options to only install it for that user, not system-wide).</p>
<p>If you have some experience with python, Jupyter, and/or anaconda, feel free to come and help others around you get up and running. Also, if you want to give a lightning talk on something in this area, please feel free to prepare a 3-5 minute demo on something you think might be interesting to THW.</p>
<h1 id="jupyter-notebooks">Jupyter notebooks</h1>
<h2 id="view-on-the-web">View on the web</h2>
<ul>
<li><a href="https://github.com/thehackerwithin/berkeley/blob/master/code_examples/intropy_sp17/thw-intropy-notes.ipynb">All the code pre-written</a></li>
<li>A <a href="https://github.com/thehackerwithin/berkeley/blob/master/code_examples/intropy_sp17/thw-intropy-notes-nocode.ipynb">blank notebook with section headings, if you want to try and follow along yourself</a></li>
</ul>
<h2 id="or-clone-with-git-and-run-yourself">Or clone with git and run yourself:</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/thehackerwithin/berkeley
jupyter notebook
</code></pre></div></div>
<p>Then navigate in the web interface to berkeley/code_examples/intropy_sp17</p>
Navigating bash and UNIX environments -- Akos, Mitch, and Matthias2017-01-31T00:00:00+00:00https://BIDS.github.io/dats/posts/bash-unix-env<h1 id="topics">Topics</h1>
<ul>
<li>UNIX intro (some history, UNIX in society)</li>
<li>UNIX design principles, or at least some of them, briefly</li>
<li>Shells and command-line interface</li>
<li>Shell scripting basics</li>
<li>Cool tricks</li>
</ul>
<h1 id="system-requirements">System requirements</h1>
<p>Do you have a Mac? Open the <strong>Terminal</strong> app. You’re done.</p>
<p>Do you run Linux? Open your computer. You’re done.</p>
<p>Do you run Windows? See next section.</p>
<h1 id="how-to-get-bash-or-a-unix-like-environment-on-windows">How to get bash or a Unix-like environment on Windows</h1>
<ol>
<li>Install <a href="https://git-for-windows.github.io">Git Bash</a>. (Instructions copied from <a href="https://github.com/dlab-berkeley/programming-fundamentals">here</a>.)</li>
</ol>
<p>Download the Git for Windows installer. Run the installer and follow the steps bellow:</p>
<ul>
<li>Click on “Next”. (5 times)</li>
<li>Select “Use Git from the Windows Command Prompt” and click on “Next”. If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.</li>
<li>Click on “Next”. Keep “Checkout Windows-style, commit Unix-style line endings” selected.</li>
<li>Select “Use Windows’ default console window” and click on “Next”.</li>
<li>Click on “Next”.</li>
<li>Click on “Finish”.</li>
</ul>
<p>This will provide you with both Git and Bash in the Git Bash program.</p>
<ol>
<li>
<p>Run Linux on a virtual machine, e.g., <a href="https://www.virtualbox.org/">VirtualBox</a>, or in a container, e.g., <a href="https://docs.docker.com/engine/getstarted/step_one/">Docker</a>.</p>
</li>
<li>
<p>Run Linux from an external USB storage device, e.g., <a href="https://www.ubuntu.com/download/desktop/create-a-usb-stick-on-windows">live USB instructions for Ubuntu</a>.</p>
</li>
<li>
<p>If you don’t want to do any of that</p>
<ul>
<li>Open a bash terminal at <a href="http://try.jupyter.org/">try.jupyter.org</a></li>
<li>If you have a GitHub account and can use <code class="highlighter-rouge">ssh</code>, https://dply.co provides 2 hours free server time. Set that up by yourself though.</li>
</ul>
</li>
</ol>
<h1 id="learning-resources">Learning resources</h1>
<p>For those desiring something more structured, thoughtful, and professional…</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/List_of_Unix_commands">List of Unix Commands</a></li>
<li>Software Carpentry <a href="http://swcarpentry.github.io/shell-novice/">Unix Shell Lessons</a></li>
<li><a href="https://github.com/veltman/clmystery">The Command Line Murders</a>, a game to teach yourself the Unix CLI.</li>
<li><a href="http://www.thehackerwithin.org/berkeley/upcoming.html">Advanced Bash-Scripting Guide</a> from The Linux Documentation Project</li>
<li>O’Reilly <a href="https://ssearch.oreilly.com/?q=unix+shell">books on Unix & shell topics</a></li>
<li><a href="http://i.imgur.com/XUhbf2D.gif">How to find files</a> hidden inside a computer</li>
</ul>
What to Learn and Teach for Spring 20172017-01-24T00:00:00+00:00https://BIDS.github.io/dats/posts/what-to-learn-and-teach-2017<p><a href="https://docs.google.com/document/d/1UQZiTN_RrRGxcYWrf11lBzw7AjdwLVOmuEzPKCDStPk/edit?usp=sharing">Google doc for notes here</a></p>
<p>The first meeting of THW for Spring 2017 will be at 4:00pm in the Berkeley Institute for Data Science, Doe Library room 190. We will talk about what we want to learn and then try and fill up as much of the schedule as possible.</p>
Ensemble (Machine) Learning with Super Learner and H2O in R -- Nima Hejazi and Evan Muzzall2016-12-06T00:00:00+00:00https://BIDS.github.io/dats/posts/ensemble-R<h2 id="nima-hejazi--evan-muzzall">Nima Hejazi & Evan Muzzall</h2>
<p>Nima is a graduate student in the Division of Biostatistics. His research
combines aspects of causal inference, statistical machine learning, and
nonparametric statistics, with a focus on the development of robust methods for
addressing inference problems arising in precision medicine, computational
biology, and clinical trials.</p>
<p>Evan earned his Ph.D. in Biological Anthropology from Southern Illinois
University Carbondale where he focused on spatial patterns of skeletal and
dental variation in two large necropoles of Iron Age Central Italy (1st
millennium BC). He is currently R Lead Instructor, co-founder of the Machine Learning Working Group, and Research Associate in the D-Lab.</p>
<h2 id="ensemble-machine-learning-with-super-learner-and-h2o-in-r">Ensemble (Machine) Learning with Super Learner and H2O in R</h2>
<p>This presentation covers methods for performing ensemble machine learning with the <a href="https://cran.r-project.org/web/packages/SuperLearner/index.html">Super
Learner</a> R
package and <a href="http://www.h2o.ai">H2O</a> software platform, using the <a href="https://www.r-project.org">R language
for statistical computing</a>.</p>
<p><strong>Materials for this presentation are available on GitHub
<a href="https://github.com/nhejazi/talk-h2oSL-THW-2016">here</a></strong>.</p>
<h3 id="r--rstudio-installation">R & RStudio Installation</h3>
<ul>
<li>You can download R and RStudio
<a href="https://www.rstudio.com/products/rstudio/download/">here</a>.</li>
</ul>
<h3 id="jupyter-r-kernel-installation">Jupyter R Kernel Installation</h3>
<ul>
<li>Please follow the instructions
<a href="https://irkernel.github.io/installation/">here</a> to install an R kernel for
Jupyter notebooks.</li>
</ul>
<h3 id="superlearner-installation">SuperLearner Installation</h3>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">require</span><span class="p">(</span><span class="s2">"devtools"</span><span class="p">)</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"ecpolley/SuperLearner"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<h3 id="h2o-installation">H2O Installation</h3>
<p>These installations are required to make H2O work in RStudio. Click the links
to visit the download pages.</p>
<ol>
<li>
<p><a href="https://www.rstudio.com/products/rstudio/download/">Download RStudio</a></p>
</li>
<li>
<p><a href="http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html">Download Java Runtime
Environment</a></p>
</li>
<li>
<p><a href="http://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/index.html">Download H2O for R and dependencies (click the “Use H2O directly from R”
tab and follow the copy/paste instructions)</a></p>
</li>
<li>
<p>Install the <code class="highlighter-rouge">devtools</code> and <code class="highlighter-rouge">h2oEnsemble</code> R packages.</p>
</li>
</ol>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># The following two commands remove any previously installed H2O packages for R.</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="s2">"package:h2o"</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">search</span><span class="p">())</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">detach</span><span class="p">(</span><span class="s2">"package:h2o"</span><span class="p">,</span><span class="w"> </span><span class="n">unload</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="s2">"h2o"</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">installed.packages</span><span class="p">()))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">remove.packages</span><span class="p">(</span><span class="s2">"h2o"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="c1"># Next, we download packages that H2O depends on.</span><span class="w">
</span><span class="n">pkgs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"methods"</span><span class="p">,</span><span class="s2">"statmod"</span><span class="p">,</span><span class="s2">"stats"</span><span class="p">,</span><span class="s2">"graphics"</span><span class="p">,</span><span class="s2">"RCurl"</span><span class="p">,</span><span class="s2">"jsonlite"</span><span class="p">,</span><span class="s2">"tools"</span><span class="p">,</span><span class="s2">"utils"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">pkg</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">pkgs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="w"> </span><span class="p">(</span><span class="n">pkg</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">installed.packages</span><span class="p">())))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">install.packages</span><span class="p">(</span><span class="n">pkg</span><span class="p">,</span><span class="w"> </span><span class="n">repos</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"http://cran.rstudio.com/"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1"># Now we download, install and call the H2O package for R.</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"h2o"</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"source"</span><span class="p">,</span><span class="w"> </span><span class="n">repos</span><span class="o">=</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"http://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/R"</span><span class="p">)))</span><span class="w">
</span><span class="c1"># Install the "devtools" R package.</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"devtools"</span><span class="p">))</span><span class="w">
</span><span class="c1"># Install the "h2oEnsemble" R package.</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"h2oai/h2o-3/h2o-r/ensemble/h2oEnsemble-package"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Load packages</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">h</span><span class="m">2</span><span class="n">o</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">h</span><span class="m">2</span><span class="n">oEnsemble</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
RStudio -- Diya Das and Wolf Ketter2016-11-29T00:00:00+00:00https://BIDS.github.io/dats/posts/RStudio<h2 id="diya-das-and-wolf-ketter">Diya Das and Wolf Ketter</h2>
<p><+ speaker bio +></p>
<h2 id="r-and-rstudio">R and RStudio</h2>
<p>You can download R and RStudio at https://www.rstudio.com/products/rstudio/download/</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Thanksgiving -- The Turkey Within (no meeting)2016-11-22T00:00:00+00:00https://BIDS.github.io/dats/posts/thanksgiving<h1 id="there-is-no-meeting-this-week">There is no meeting this week</h1>
Machine learning with scikit-learn - Rochelle Terman and Christopher Hench2016-11-15T00:00:00+00:00https://BIDS.github.io/dats/posts/scikit-learn<h2 id="rochelle-terman-and-christopher-hench">Rochelle Terman and Christopher Hench</h2>
<p><+ speaker bio +></p>
<h2 id="machine-learning-with-scikit-learn">Machine learning with scikit-learn</h2>
<p>Clone <a href="https://github.com/henchc/THW-scikit-learn">this Github repository</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Matplotlib - Yu Feng2016-11-08T00:00:00+00:00https://BIDS.github.io/dats/posts/matplotlib<h2 id="yu-feng">Yu Feng</h2>
<p><+ speaker bio +></p>
<h2 id="matplotlib">matplotlib</h2>
<p>A Jupyter notebook is <a href="https://github.com/rainwoodman/thehackerwithin-berkeley/blob/fe8dfd31c7661ce66bb740dd060f4a8b7146cc44/python_matplotlib/matplotlib-the-hard-way.ipynb">here</a></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Physical Computing - Brandon Curtis2016-11-01T00:00:00+00:00https://BIDS.github.io/dats/posts/physical-computing<h2 id="brandon-curtis">Brandon Curtis</h2>
<p><+ speaker bio +></p>
<h2 id="topic-"><+topic +></h2>
<p><+ notes +></p>
<p>Code examples</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
The Python Olympics - John Bohannon2016-10-25T00:00:00+00:00https://BIDS.github.io/dats/posts/python-olympics<h2 id="john-bohannon">John Bohannon</h2>
<p><+ speaker bio +></p>
<h2 id="the-python-olympics">The Python Olympics</h2>
<p>The fastest way to learn a programming language is to use it. So why not turn that into a game?</p>
<p>All levels of experience welcome. We have Python puzzles for advanced coders and beginners alike.</p>
<p>This will also be the world debut of a new kind of interactive IPython Notebook designed for group coding games.</p>
<p>See you at the games!</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Parallelization in Python - Remi Lehe2016-10-18T00:00:00+00:00https://BIDS.github.io/dats/posts/parallelization-python<h2 id="remi-lehe">Remi Lehe</h2>
<p><+ speaker bio +></p>
<h2 id="parallelization-in-python">Parallelization in Python</h2>
<p>A Jupyter notebook <a href="https://github.com/RemiLehe/thw_parallel_python">is here</a>, click the “launch binder” icon.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Natural Language Processing for Python with NLTK -- Christopher Hench2016-10-11T00:00:00+00:00https://BIDS.github.io/dats/posts/nlp-nltk<h2 id="christopher-hench">Christopher Hench</h2>
<p><+ speaker bio +></p>
<h2 id="nltk">NLTK</h2>
<p>Text data requires a separate preprocessing stage often referred to as the ‘NLP pipeline’. One popular library for its implementation is Python’s NLTK (Natural Language Toolkit). This talk will cover how to clean text data, tag parts of speech (POS), identify named entities (NER), and quantify sentiment beyond dictionary look-up. While not explored in this talk, these preprocessing steps are often critical to developing more advanced, high-level models for document classifiers, topic modeling, and network models by providing targeted feature sets.</p>
<h3 id="installation">Installation</h3>
<p>We are using this <a href="https://github.com/thehackerwithin/berkeley/blob/master/nltk/THW_NLTK.ipynb">Jupyter notebook</a> in the thehackerwithin/berkeley repo, master branch, nltk folder.</p>
<p>For installation of Python and NLTK follow <a href="https://github.com/dlab-berkeley/python-intensive/blob/master/Install.md">these instructions</a></p>
<p>If you installed anaconda:</p>
<p>conda install nltk</p>
<p>Otherwise:</p>
<p>pip install nltk</p>
<p>Lastly, the NER wrapper requires the Java Stanford NER <a href="http://nlp.stanford.edu/software/CRF-NER.shtml#Download">here</a>:
Note: do not download the extension, just Download Stanford Named Entity Recognizer version 3.6.0</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Git and Github -- Tony Kelman and Garret Christensen2016-10-04T00:00:00+00:00https://BIDS.github.io/dats/posts/github<h2 id="presenters">Presenters</h2>
<h3 id="tony-kelman">Tony Kelman</h3>
<h3 id="garret-christensen">Garret Christensen</h3>
<p><+ speaker bios +></p>
<h2 id="topics">Topics</h2>
<h3 id="git">Git</h3>
<h3 id="github">Github</h3>
<p><+ notes +></p>
<p>Code examples</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Github Pages and Jekyll - Stuart Geiger2016-09-27T00:00:00+00:00https://BIDS.github.io/dats/posts/github-pages-jekyll<h1 id="stuart-geiger">Stuart Geiger</h1>
<p>I’m a postdoc at <a href="http://bids.berkeley.edu">the Berkeley Institute for Data Science</a> and I recently completed my Ph.D last December at the UC-Berkeley <a href="http://ischool.berkeley.edu">School of Information</a> next door. I’m an ethnographer of science and technology, and I study how people produce knowledge. My Ph.D research was about Wikipedia’s volunteer editing community, and I’m now studying the emergence of this thing we like to call data science. In my work, I use many different kinds of methods – sometimes I look more like an anthropologist, a historian, or a philosopher, while other times I run surveys, experiments, and large-scale data analyses.</p>
<h1 id="github-pages-and-jekyll">Github Pages and Jekyll</h1>
<p>Github Pages is a free web hosting service by Github, which uses Jekyll to generate HTML files from files (themes, layouts, and data) in a special Github repository. Whenever you make a commit to a Github Pages repository, Github’s servers run the Jekyll parser on the files in that repository, which generates a set of static HTML and CSS files on a special subdomain. The result can look nearly identical to traditional content management systems (like Wordpress or Drupal) that dynamically process requests from browsers using languages like PHP and querying live databases like MySQL.</p>
<h2 id="advantages-over-the-dynamiccms-approach">Advantages over the dynamic/CMS approach:</h2>
<ul>
<li>Fewer moving parts to configure and maintain</li>
<li>No need to be a systems administrator</li>
<li>More secure from hackers (the bad kind)</li>
<li>Uses existing Github infrastructure for logins and collaboration</li>
<li>Free hosting! (recommended max: 100,000 requests/month)</li>
</ul>
<h2 id="what-you-need">What you need</h2>
<ul>
<li>For most of this session, just a Github account and a web browser</li>
<li>For a few minutes at the end, I’ll walk people through running Jekyll locally. <a href="https://jekyllrb.com/docs/installation/">Install instructions are here</a> for OS X and Linux (Windows is not officially supported).</li>
</ul>
<h3 id="repositories-to-fork">Repositories to fork</h3>
<ul>
<li><a href="https://github.com/academicpages/group-meeting">academicpages/group-meeting</a></li>
<li><a href="https://github.com/academicpages/academicpages.github.io">academicpages/academicpages.github.io</a></li>
<li><a href="https://github.com/academicpages/events">academicpages/events</a></li>
</ul>
<h2 id="tips-and-tricks">Tips and tricks</h2>
<ul>
<li>Settings are in the settings tab of your repository, in the “GitHub Pages” section.
<ul>
<li>You can see details about errors here, although they can be misleading / hard to decode</li>
</ul>
</li>
<li>Jekyll’s markdown parser/renderer can be stricter than Github’s, and will just print raw markdown if it hits something it won’t parse</li>
<li>Go to the commit list (on your repo) to find the last version Github built with Jekyll.
<ul>
<li>Green check: successful build</li>
<li>Orange circle: building</li>
<li>Red X: error</li>
<li>No icon: not built</li>
</ul>
</li>
<li>YAML is important and easy to mess up (YAML Ain’t a Markup Language)
<ul>
<li><a href="http://symfony.com/doc/current/components/yaml/yaml_format.html">The YAML format</a></li>
<li>Invalid YAML declarations will cause builds to fail in ways that generate misleading errors</li>
<li>Valid YAML declarations will be rendered by Github as a nice, formatted table.</li>
<li>YAML uses C-style quote escape sequences</li>
</ul>
</li>
</ul>
<h2 id="examples-of-goodeasyinteresting-github-pages-sites">Examples of good/easy/interesting Github Pages sites</h2>
<h3 id="themes">Themes</h3>
<ul>
<li><a href="https://github.com/aron-bordin/neo-hpstr-jekyll-theme">Neo HPSTR theme</a></li>
<li><a href="https://github.com/mmistakes/made-mistakes-jekyll">Made Mistakes</a></li>
<li><a href="https://github.com/mmistakes/skinny-bones-jekyll">Skinny Bones</a></li>
<li><a href="https://github.com/holman/left">Left</a></li>
</ul>
<h3 id="real-world-examples">Real world examples</h3>
<ul>
<li><a href="http://acmsocc.github.io/2016/">ACM Conference on Cloud Computing</a> – <a href="https://github.com/acmsocc/2016">Github repo</a>
<ul>
<li>Very detailed and polished (and complicated)</li>
<li>Uses YAML to generate schedule</li>
</ul>
</li>
<li><a href="http://astrohackweek.org/2016/">AstroHackWeek</a> – <a href="https://github.com/AstroHackWeek/2016">Github repo</a>
<ul>
<li>Single page scrolling layout, based on Solid State by HTML5 UP</li>
</ul>
</li>
<li><a href="http://switch2osm.github.io/">Switch2OSM</a> – <a href="https://github.com/switch2osm/switch2osm.github.io">Github repo</a>
<ul>
<li>Uses <a href="https://github.com/aron-bordin/neo-hpstr-jekyll-theme">Neo HPSTR theme</a></li>
</ul>
</li>
</ul>
<h1 id="lightning-talks">Lightning talks</h1>
<h2 id="matthias-bussonnier">Matthias Bussonnier</h2>
<p><a href="https://github.com/Carreau/talks/blob/master/2016-09-23-uc-merced-seminar/Cross%20Language%20Integration.ipynb">Cross language Jupyter</a></p>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
Machine Learning for Kaggle Competitions with R -- Jerry Chen2016-09-20T00:00:00+00:00https://BIDS.github.io/dats/posts/ml-kaggle<h2 id="jerry-chen">Jerry Chen</h2>
<p><+ speaker bio +></p>
<h2 id="description">Description</h2>
<p>Kaggle is a data science platform where data scientists from all over the world work together and compete in real-world machine learning challenges. These public data sets cover a wide array of interesting problems from diagnosing eye problems based on images of the retina to recommending coupons to users who visit a site. On Tuesday, we will explore the machine learning process in the context of competitions and how Kaggle is becoming a really good starting point for machine learning enthusiasts to collaborate and learn new things.</p>
<p>Code:</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
The Bash Olympics -- Aaron Culich and John Bohannon2016-09-13T00:00:00+00:00https://BIDS.github.io/dats/posts/bash<h2 id="aaron-culich-and-john-bohannon">Aaron Culich and John Bohannon</h2>
<p><+ speaker bio +></p>
<h2 id="topic-"><+topic +></h2>
<p><+ notes +></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
What To Learn and Teach - Everyone2016-09-06T00:00:00+00:00https://BIDS.github.io/dats/posts/learn-teach<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. I hope you’ll join us!</li>
</ul>
<p>If you can’t join us, but would like to request to learn or teach a topic
related to scientific computing, please fill out
<a href="TBD">this google form</a>.</p>
<h2 id="discussion-what-do-you-want-to-learn-and-what-can-you-teach">Discussion: What Do You Want To Learn and What Can You Teach</h2>
<p>Our first meeting of the semester will be focused on introductions and building
this semester’s schedule of topics. To mold the upcoming schedule of topics to
your needs and desires, please attend. We will engage in a fun democratic
exercise in which we each offer and request knowledge. In this way, we’ll keep THW relevant by
weighing in on what topics are important to us as a community.</p>
<p>To request particular sessions, volunteer some useful knowledge, or just hang out,
please join us at 4:00pm in Room 190 of Doe Library.</p>
<h2 id="first-time-attendees">First Time Attendees</h2>
<p>We are very hopeful that many new faces will join us this semester. We would
especially love your input at this meeting. Your voice will help us to make The
Hacker Within as useful and peer-driven as possible.</p>
<p>More information on the how, when, where, and why of this meeting can be found
at:</p>
<ul>
<li><a href="http://thehackerwithin.github.io/berkeley/" title="The About Page">the THW@UCB about page</a></li>
<li>and <a href="http://bids.berkeley.edu/events/hacker-within">the BIDS event page for this meeting</a></li>
</ul>
<h2 id="results">Results</h2>
<p>I will list the results here when the meeting is over.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<ul>
<li><++></li>
</ul>
<h2 id="-speaker-"><+ speaker +></h2>
<p><+ speaker bio +></p>
<h2 id="topic-"><+topic +></h2>
<p><+ notes +></p>
D3.js - Kai Chang2016-04-27T00:00:00+00:00https://BIDS.github.io/dats/posts/d3-spring-2016<h2 id="attending">Attending</h2>
<ul>
<li><++></li>
</ul>
<h2 id="kai-chang">Kai Chang</h2>
<p>Kai Chang is an experienced used of D3.js, design technologist at Stamen Design and co-organizer of the Bay Area D3.js User Group.</p>
<h2 id="d3js-for-building-exploratory-visualization-tools">D3.js for building Exploratory Visualization Tools</h2>
<h3 id="who-is-here-why-are-you-interested-in-d3js">Who is here? Why are you interested in D3.js?</h3>
<ul>
<li>Data journalists?</li>
<li>Data scientists?</li>
<li>Real scientists?</li>
</ul>
<h3 id="speaker-links">Speaker Links</h3>
<ul>
<li><a href="http://bl.ocks.org/syntagmatic">syntagmatic’s blocks</a></li>
<li><a href="https://twitter.com/syntagmatic">Twitter</a></li>
</ul>
<h3 id="d3js-resources">D3.js Resources</h3>
<ul>
<li><a href="http://bost.ocks.org/mike/">Mike Bostock’s Interactive Essays</a></li>
<li><a href="http://bl.ocks.org/mbostock">mbostock’s blocks</a></li>
<li><a href="https://www.jasondavies.com/">Jason Davies Gallery</a></li>
<li><a href="http://christopheviau.com/d3list/">Big List of D3.js Examples</a></li>
<li><a href="https://github.com/mbostock/d3/wiki/API-Reference">D3.js API Reference</a></li>
</ul>
<h3 id="parallel-coordinates">Parallel Coordinates</h3>
<ul>
<li><a href="https://mbostock.github.io/protovis/ex/cars.html">Protovis Parallel Coordinates</a></li>
<li><a href="http://bl.ocks.org/mbostock/1341021">D3.js Parallel Coordinates</a></li>
<li><a href="http://exposedata.com/parallel/">Veggie Coordinates</a></li>
<li><a href="http://bl.ocks.org/syntagmatic/raw/3150059/">Nutrient Explorer</a></li>
<li><a href="http://bl.ocks.org/syntagmatic/raw/3290392/">Fisheye Nutrient Explorer</a></li>
<li><a href="http://syntagmatic.github.io/parallel-coordinates/">d3.parcoords.js</a></li>
<li><a href="http://bl.ocks.org/syntagmatic/42d5b54c5cfe002e7dd8">EcoEngine Parallel Coordinates</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/parcoords.html">SCN-STOCK Contig Taxonomy</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/parcoords-CN-SCN.html">CN-SCN Contig Taxonomy</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/parcoords-carrol.html">CARROL Contig Taxonomy</a></li>
</ul>
<h3 id="ecoengine">EcoEngine</h3>
<ul>
<li><a href="http://globalchange.berkeley.edu/ecoinformatics-engine">Project Description</a></li>
<li><a href="https://ecoengine.berkeley.edu/">EcoEngine API</a></li>
<li><a href="https://github.com/stamen/ecoengine#prototypes">D3.js Prototypes</a></li>
</ul>
<h3 id="metagenomics">Metagenomics</h3>
<h4 id="parallel-coordinates-1">Parallel Coordinates</h4>
<ul>
<li><a href="http://stamen.github.io/metag/taxonomy/parcoords.html">SCN-STOCK Contig Taxonomy</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/parcoords-CN-SCN.html">CN-SCN Contig Taxonomy</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/parcoords-carrol.html">CARROL Contig Taxonomy</a></li>
</ul>
<h4 id="radial-tree">Radial Tree</h4>
<ul>
<li><a href="http://stamen.github.io/metag/taxonomy/radial-tree.html">SCN-STOCK Radial Tree</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/radial-tree-CN-SCN.html">CN-SCN Radial Tree</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/radial-tree-carrol.html">CARROL Radial Tree</a></li>
</ul>
<h4 id="treemap">Treemap</h4>
<ul>
<li><a href="http://stamen.github.io/metag/taxonomy/treemap.html">SCN-STOCK Treemap</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/treemap-CN-SCN.html">CN-SCN Treemap</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/treemap-carrol.html">CARROL Treemap</a></li>
</ul>
<h4 id="partition-layout">Partition Layout</h4>
<ul>
<li><a href="http://stamen.github.io/metag/taxonomy/partition.html">SCN-STOCK Partition</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/partition-CN-SCN.html">CN-SCN Partition</a></li>
<li><a href="http://stamen.github.io/metag/taxonomy/partition-carrol.html">CARROL Partition</a></li>
</ul>
<h3 id="specific-d3js-techniques">Specific D3.js Techniques</h3>
<ul>
<li><a href="https://github.com/d3/d3-hierarchy">d3-hierarchy</a></li>
<li><a href="http://bl.ocks.org/mbostock/3808218">General Update Pattern</a> - one of the big D3.js learning hurdles</li>
<li><a href="http://bl.ocks.org/mbostock/3014589">Perceptual Color Spaces</a></li>
<li><a href="http://bl.ocks.org/mbostock/4330486">Bivariate Hexbin</a></li>
<li><a href="http://bl.ocks.org/mbostock/3711652">Dynamic Projections</a></li>
</ul>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/topic" title="Code Examples">here</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Tableau - Harrison Dekker2016-04-20T00:00:00+00:00https://BIDS.github.io/dats/posts/tableau-spring-2016<h2 id="attending">Attending</h2>
<ul>
<li>lots!</li>
</ul>
<h2 id="harrison-dekker">Harrison Dekker</h2>
<h2 id="tableau">Tableau</h2>
Build Systems - Tony Kelman2016-04-13T00:00:00+00:00https://BIDS.github.io/dats/posts/build-systems-spring-2016<h2 id="attending">Attending</h2>
<ul>
<li>lots!</li>
</ul>
<h2 id="tony-kelman">Tony Kelman</h2>
<p>Tony is a lecturer in Mechanical Engineering, and a core contributor to the Julia language. He likes building things, including scientific software.</p>
<h2 id="build-systems">Build systems</h2>
<p>Yay open source! So there’s some cool library you want to use, and its author was kind enough to share the source code with the world. But maybe that’s all that they provided? Or you want to change something, fix a bug, add a feature, etc. For libraries written in compiled languages like C, C++, Fortran, etc, compilation and dependencies can be hard. There are a variety of build systems commonly used by open-source projects to assist in building libraries and managing dependencies across various platforms. I’ll talk about the <a href="https://en.wikipedia.org/wiki/GNU_Build_System" title="Autotools">GNU autotools</a> (and Make), <a href="https://cmake.org/" title="CMake">CMake</a>, and briefly mention <a href="https://gyp.gsrc.io/" title="gyp">gyp</a>. I’ll work through an example using a small but nontrivial C library.</p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/build_systems" title="Code Examples">here</a>.</p>
Cython - Kyle Barbary2016-04-06T00:00:00+00:00https://BIDS.github.io/dats/posts/cython-spring-2016<h2 id="attending">Attending</h2>
<p>Lots!</p>
<h2 id="kyle-barbary">Kyle Barbary</h2>
<p>Kyle is a Data Science Fellow at the Berkeley Institute for Data Science.</p>
<h2 id="cython">Cython</h2>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/cython_spring16">here</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="qingkai-kong-on-line_profiler">Qingkai Kong on line_profiler</h3>
<p>kernprof! Google it.</p>
<h3 id="katy-huff-on-pythons-cprofiler-and-snakeviz">Katy Huff on python’s cprofiler and snakeviz</h3>
<p>Snakeviz! Google it.</p>
<h3 id="seán-ó-nualláin-on-sonas">Seán Ó Nualláin on <a href="https://www.youtube.com/watch?v=gDZ_GOt13eg">SONAS</a></h3>
<p>Sonas: <a href="https://www.youtube.com/watch?v=gDZ_GOt13eg">https://www.youtube.com/watch?v=gDZ_GOt13eg</a></p>
Python For Plotting Timeseries & 3D Data - Qingkai Kong, Andy Haefner2016-03-30T00:00:00+00:00https://BIDS.github.io/dats/posts/pythonic-plotting-spring-2016<h2 id="attending">Attending</h2>
<ul>
<li><++></li>
</ul>
<h2 id="qingkai-kong">Qingkai Kong</h2>
<p>I am PhD student at Berkeley Seismological Lab of Earth and Planetary Science Department. My research area is Earthquake Early Warning System, I am working on using your smartphones to detect earthquakes. I am also really interested in data science, now working on how to apply data science skills back to Seismology. You can chechout my Github <a href="https://github.com/qingkaikong">here</a>.</p>
<p>Code examples for my presentation can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/python_mining_emails">here</a>.</p>
<h2 id="andy-haefner">Andy Haefner</h2>
<p><+ speaker bio +></p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/python_mayavi">here</a>.</p>
<h2 id="500pm-machine-learning-club">5:00pm Machine Learning Club</h2>
<p>At 5:00pm, the Machine Learning Club will jump in and have a complementary talk
on reproducible vizualizations using Lightning.</p>
<h3 id="abstract">Abstract</h3>
<p>Creating reproducible scientific research has been a goal of the academic
community for as long as I have been a part of it and has seen great successes
(such as the interactive Nature article and LIGO Gravitation wave analysis), in
part due to the efforts of the Python (and Jupyter) community. But I like to
believe that these efforts stem from a more human root cause to understand the
world around us and as such should be relevant to anyone (not just the
scientific Python community) trying to communicate the results of research.</p>
<p>In this talk give a brief history on why (and how) we need to make all of our
analyses reproducible and how (web based) interactive visualizations are
essential to making research much more accessible to the world at large. By
creating a reusable (and extensible) chart using the Lightning visualization
library I will highlight the role visualization plays in making analyses
accessible to others and how web based technologies such as Javascript and D3
can liberate our results from the static prison of PDFs. And along the way I
will (hopefully) show you the potential of interaction to change the hearts and
minds of (colleagues) and the world.</p>
No Meeting - Spring Break2016-03-23T00:00:00+00:00https://BIDS.github.io/dats/posts/spring-break<h2 id="attending">Attending</h2>
<p>Don’t show up. Go on vacation. It’s spring break, fool.</p>
matplotlib - Tenzing Joshi & Nick Swanson-Hysell2016-03-16T00:00:00+00:00https://BIDS.github.io/dats/posts/matplotlib-spring-2016<h2 id="tenzing-joshi-bio">Tenzing Joshi bio</h2>
<p>I am a post-doc in the Applied Nuclear Physics Program at LBL.</p>
<h2 id="nick-swanson-hysell-bio">Nick Swanson-Hysell bio</h2>
<p>I am an Assistant Professor of Earth and Planetary Science here at UC Berkeley. My research is focused on reconstructing conditions on the ancient Earth with a particular focus on using magnetic data from rocks to determine the past positions of continents. You can learn more at <a href="http://www.swanson-hysell.org/">my website</a>. I seek to us tools that facilitate open and reproducible data analysis. You can find <a href="https://github.com/Swanson-Hysell">me
</a> and my <a href="https://github.com/Swanson-Hysell-group">research group</a> on Github.</p>
<h2 id="matplotlib-presentation-through-notebook-demos">matplotlib presentation through notebook demos</h2>
<p>Code to install (if you use Anaconda, use conda install instead of pip install):</p>
<p>pip install matplotlib</p>
<p>pip install Basemap</p>
<p>pip install mpld3</p>
<p>pip install folium</p>
<p>pip install bokeh</p>
<p>Introduction to matplotlib: <a href="https://github.com/thehackerwithin/berkeley/blob/master/python_matplotlib/Matplotlib_THW_tutorial.ipynb">Jupyter Notebook with example code</a></p>
<p>Using Basemap to plot geospatial data and other tricks/tools using matplotlib (“what used to bug me about using matplotlib, but doesn’t anymore”): <a href="https://github.com/thehackerwithin/berkeley/blob/master/python_matplotlib/Matplotlib_Basemap_Notebook.ipynb">Jupyter Notebook with example code</a></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Will occur as the spirit moves THW attendees.</p>
Handling and Visualizing Geospatial Data - Kevin Koy2016-03-09T00:00:00+00:00https://BIDS.github.io/dats/posts/gis-spring-2016<h2 id="attending">Attending</h2>
<p>Approximately 35 people</p>
<h2 id="kevin-koy">Kevin Koy</h2>
<p>Kevin is the Executive Director of the Berkeley Institute for Data Science
(BIDS). He was previously executive director at the Geospatial Innovation
Facility (GIF).</p>
<h2 id="geospatial-data">Geospatial Data</h2>
<h3 id="resources">Resources</h3>
<p>To get help beyond this talk, visit the Berkeley <a href="http://gif.berkeley.edu">Geospatial Innovation Facility</a>. They have resource guides, office hours, workshops, and more.</p>
<p>Data:</p>
<ul>
<li><a href="http://gif.berkeley.edu/resources/data.html">Berkeley GIF resources for finding data</a></li>
<li><a href="http://www.gadm.org">Administrative Boundaries</a></li>
<li><a href="http://www.prism.oregonstate.edu">Climate data</a></li>
<li><a href="http://openstreetmap.org">Open Street Map</a></li>
</ul>
<p>Tools:</p>
<ul>
<li><a href="http://www.qgis.org/en/site/">Quantum GIS</a>: open source, multiplatform geospatial software</li>
</ul>
<h3 id="notes">Notes</h3>
<p>There are two general types of GIS data. These are:</p>
<ul>
<li>Vector data (in shapefiles, for example)</li>
<li>and Raster data (in pixels, numbered cells).</li>
</ul>
<p>Where to find Geospatial Data?</p>
<p><a href="gif.berkeley.edu/resources/data.html">gif.berkeley.edu/resources/data.html</a></p>
<p>For the demo, there are a few places where the data will be downloaded from:</p>
<p><a href="http://gadm.org">http://gadm.org</a>
<a href="http://prism.oregonstate.edu">http://prism.oregonstate.edu</a>
<a href="http://www.openstreetmap.org">http://www.openstreetmap.org</a></p>
<p>Kevin demonstrated using QGIS (an open source
alternative to ARCGIS). <a href="http://www.qgis.org/en/site/">http://www.qgis.org/en/site/</a></p>
<p>He also showed us how to publish our map data on the webservice Cartodb at
<a href="https://cartodb.com">https://cartodb.com</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="aji--terraview">Aji : TerraView</h2>
<p>Aji shared <a href="http://terraview.io:8080/landing">TerraView</a>, a node.js app which
can show information about air quality.
It has many layers, a straighforward map, using open street map. But, on top of
it, using node.js, there’s an open source package called leaflet which allows
lots of extra layers to the map. In real time, new values are updated.</p>
Python Metaprogramming & Conversion to Python 3 - Ryan Pavlovsky & Matthias Bussonnier2016-03-02T00:00:00+00:00https://BIDS.github.io/dats/posts/metaprogramming-py3-spring-2016<h2 id="attending">Attending</h2>
<ul>
<li><++></li>
</ul>
<h2 id="ryan-pavlovsky">Ryan Pavlovsky</h2>
<p><+ speaker bio +></p>
<h2 id="matthias-bussonnier">Matthias Bussonnier</h2>
<p>Matthias is a PostDoc at BIDS, Jupyter and IPython core developer, as well as a pesky Python 3 evangelist.</p>
<h2 id="python-metaprogramming">Python Metaprogramming</h2>
<p>An IPython Notebook on python metaprogramming can be found <a href="https://github.com/thehackerwithin/berkeley/blob/master/python_metaprogramming/DecoratorsMetaclasses.ipynb" title="Python Metaprogramming">here</a>.</p>
<h2 id="conversion-to-python-3">Conversion to Python 3</h2>
<p>Not everybody may be aware, but Legacy Python 2 is reaching end of life in 2020, and
it’s well beyond time to move to Python 3, which is a much better language.</p>
<p>I’ll show some of the reason why you do not want to stay on Legacy Python, and
what are the paths you can take to migrate your codebase (including notebooks !)
to Python 3.</p>
<p>I’ll also show off some Python 3 fancy packages !</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Julia - Tony Kelman and Kyle Barbary2016-02-24T00:00:00+00:00https://BIDS.github.io/dats/posts/julia-spring-2016<h2 id="attending">Attending</h2>
<p>Many folks showed up.</p>
<h2 id="tony-kelman">Tony Kelman</h2>
<p>Tony Kelman (@tkelman) is a Julia contributor, software engineer at Julia Computing Inc, and lecturer in Mechanical Engineering. He recently completed his PhD doing research on optimization-based control.</p>
<h2 id="kyle-barbary">Kyle Barbary</h2>
<p>Kyle is a BIDS fellow.</p>
<h2 id="julia">Julia</h2>
<p>Demos that you can use to follow along, as well as a powerpoint presenation,
can be found here: <a href="https://github.com/thehackerwithin/berkeley/tree/master/julia">https://github.com/thehackerwithin/berkeley/tree/master/julia</a>.</p>
Scraping Wikipedia Data - Stuart Geiger2016-02-17T00:00:00+00:00https://BIDS.github.io/dats/posts/wikiscraping-spring-2016<h2 id="attending">Attending</h2>
<p>About 30 folks!</p>
<h2 id="stuart-geiger">Stuart Geiger</h2>
<p>I’m a postdoc at <a href="http://bids.berkeley.edu">the Berkeley Institute for Data Science</a> and I recently completed my Ph.D last December at the UC-Berkeley <a href="http://ischool.berkeley.edu">School of Information</a> next door. I’m an ethnographer of science and technology, and I study how people produce knowledge. A big focus of my work is about how new technologies change what it means to produce knowledge. In my work, I use many different kinds of methods – sometimes I look more like an anthropologist, a historian, or a philosopher, while other times I run surveys, experiments, and large-scale data analyses. My Ph.D research was about Wikipedia’s volunteer editing community, and I’m now studying the emergence of this thing we like to call data science.</p>
<h2 id="scraping-wikipedia-data">Scraping Wikipedia data</h2>
<p>We’ll be using two different resources to query Wikipedia. First, the <a href="https://www.mediawiki.org/wiki/API:Main_page">Wikipedia API</a>, which directly queries the text in Wikipedia articles, and second <a href="https://www.wikidata.org/wiki/Wikidata:Main_Page">Wikidata</a>, a new project that is trying to store all of the information in Wikipedia articles in a standardized, structured database.</p>
<h3 id="things-you-will-need">Things you will need</h3>
<ul>
<li>A clone of <a href="https://github.com/thehackerwithin/berkeley/blob/master/scraping_wikipedia/">this directory</a>, which has Jupyter notebooks</li>
<li>Jupyter notebook instance with the python kernel (I’m using python 3)</li>
<li>Python libraries (can be installed with ‘pip install …’): wikipedia, pywikibot, requests, nltk, pandas</li>
<li>A Wikipedia account (not required but <em>highly</em> recommended. <a href="https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Main+Page&type=signup">Register here!</a>)</li>
</ul>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="matthias--hacker-within-mybinder">Matthias : Hacker Within mybinder</h2>
<p>Go checkout mybinder.org. You can run the THW notebooks from your browser.</p>
<h2 id="brian--where-is-a-mountain-anyway">Brian : Where is a mountain, anyway</h2>
<p>Inspired by the geocoordinates in Stuarts talk, Brian pointed out that putting
coordinates on a mountain is tricky. Where is a mountain, anyway?</p>
Pandas - Tenzing Joshi2016-02-10T00:00:00+00:00https://BIDS.github.io/dats/posts/pandas-spring-2016<h2 id="attending">Attending</h2>
<p>Many Folks.</p>
<h2 id="tenzing-joshi">Tenzing Joshi</h2>
<p>I am a postdoc in the Applied Nuclear Physics Program at LBL.
I received my PhD from the Nuclear Engineering department at Berkeley.
My current research is focused on using modern data analysis techniques to improve the sensitivity of mobile radiation detection platforms and using insights from this work to develop future radiation detection systems.</p>
<h2 id="there-is-more-than-one-way-to-skin-a-panda">There is more than one way to skin a Panda.</h2>
<p>In this edition of The Hacker Within I’ll introduce the pandas library.
We’ll talk about Series and DataFrames.
This includes a variety of ways to create them, index into them, manipulate them, and get data out of them.</p>
<p>Follow along with this <a href="https://github.com/thehackerwithin/berkeley/tree/master/python_pandas">Jupyter Notebook</a>.
In this notebook we’ll use some <a href="https://dl.dropboxusercontent.com/u/4558549/THWPasses_segmented.hdf5">data</a>.</p>
<h2 id="resources">Resources</h2>
<ul>
<li>Pandas site
<ul>
<li>http://pandas.pydata.org/pandas-docs/stable/overview.html</li>
<li>http://pandas.pydata.org/pandas-docs/stable/tutorials.html</li>
<li>There are loads of useful examples and tutorials on this site</li>
<li>If you’re curious then take some time to look around</li>
</ul>
</li>
<li>Wes’s Book
<ul>
<li>Wes McKinney started Pandas</li>
<li>Wes wrote a book titled <strong>Python for Data Analysis</strong></li>
<li>http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793</li>
<li>This was my starting point and there is great stuff in this book</li>
</ul>
</li>
<li>Other Pandas tutorials that are worth a read
<ul>
<li>https://bitbucket.org/hrojas/learn-pandas</li>
<li>http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/</li>
<li>http://synesthesiam.com/posts/an-introduction-to-pandas.html</li>
<li>https://plot.ly/ipython-notebooks/big-data-analytics-with-pandas-and-sqlite/</li>
</ul>
</li>
<li>Stack Overflow
<ul>
<li>There are a large number of Pandas related answers on here</li>
<li>http://stackoverflow.com/questions/tagged/pandas</li>
<li>It seems like this site is monitored for pandas tagged questions, if you’re stumped then this is a great place to ask a question.</li>
</ul>
</li>
</ul>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="mike-pacer">Mike Pacer</h3>
<h3 id="kunal-marwaha">Kunal Marwaha</h3>
LaTeX - Rachel Slaybaugh, Mike Pacer, and Katy Huff2016-02-03T00:00:00+00:00https://BIDS.github.io/dats/posts/LaTeX-spring-2016<h2 id="attending">Attending</h2>
<p>About 20.</p>
<h2 id="leaders">Leaders</h2>
<h3 id="rachel-slaybaugh">Rachel Slaybaugh</h3>
<p>Rachel Slaybaugh is an Assistant Professor in the Department of Nuclear
Engineering at UC Berkeley. She was one of the founding members of The Hacker
Within at the University of Wisconsin.</p>
<h3 id="mike-pacer">Mike Pacer</h3>
<p>Michael Pacer is a cognitive scientist at UC Berkeley.</p>
<h3 id="katy-huff">Katy Huff</h3>
<p>Katy Huff is a BIDS fellow and postdoctoral fellow in the Nuclear Science and
Security Consortium.</p>
<h2 id="latex">\(\LaTeX\)</h2>
<p>First, we’ll address an introduction to the basic concepts in \(\LaTeX\). Then,
we’ll share a few tips and tricks.</p>
<h2 id="what-is-markup">What is Markup?</h2>
<h3 id="html">HTML</h3>
<p>HTML is just hypertext markup language. It provides a plain text way to
describe objects and data that are encountered in the world wide web. Things
like urls, text rendering in webpages, etc. are all easy to describe in HTML.</p>
<h3 id="xml">XML</h3>
<p>XML is the extensible markup language. It generalizes where others specify. In
the way that all reductionist things fail to get the specifics right, XML is
great for general tasks in programming (input files, etc.), but not great for
writing documents, where the needs are very specific.</p>
<h3 id="markdown-restructuredtext-where-does-it-end">MarkDown? RestructuredText? Where does it end?</h3>
<p>There are a lot of markup languages. They all do different things. Restructured
text is the standard in the world of python documentation. Markdown is the
standard on github. Pick your poison.</p>
<h2 id="how-do-i-install-latex">How Do I install \(\LaTeX\)?</h2>
<h3 id="linux">Linux</h3>
<p>Everything in linux is simple.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install texlive
</code></pre></div></div>
<h3 id="osx">OSX</h3>
<p>You should use <a href="https://tug.org/mactex/" title="mactex">MacTeX</a>. You can do this with macports or homebrew by downloading the whole shabang from
the website.</p>
<h3 id="windows">Windows</h3>
<p>I honestly have no idea. It <a href="http://tex.stackexchange.com/questions/41808/how-do-i-install-tex-latex-on-windows-7" title="TeX Stack Exchange">looks like</a> the TeX stack exchange may be able to
help, though.</p>
<h2 id="how-do-i-write-latex">How do I write \(\LaTeX\)?</h2>
<p>The not-so-short introduction to LateX is pretty great. http://tobi.oetiker.ch/lshort/lshort.pdf .</p>
<h3 id="lyx">LyX</h3>
<p>Max showed us LyX last time, which is a WYSIWYG editor for \(\LaTeX\). That’s
awesome. I recommend you give it a shot.</p>
<h3 id="texshop">TeXShop</h3>
<p>TeXShop is something that many folks use to write and render latex side by
side. It’s cool. I don’t use it, but I can see where it would be great.</p>
<h3 id="text-editors">Text Editors</h3>
<p>Some folks will find the text editor option the most extensible and glorious. I
am one of those folks. I have a vim plugin for latex called, you guessed it,
vim-latex and it does most of the typing for me. With syntax highlighting, it
tells me where there’s a mistake, and by virtue of dealing directly with the
content, I can ignore how it looks until the very end.</p>
<h2 id="how-do-i-pronounce-latex">How do I pronounce \(\LaTeX\)?</h2>
<p>Check it out, the last letter is the Greek letter \(\chi\). So, it definitely has to
end in a K sound. But, is it Lay or Lah? The developers say it’s up to you.</p>
<h2 id="what-are-the-parts-of-a-document">What are the Parts of a Document?</h2>
<p>\(\LaTeX\) documents have numerous parts.</p>
<h3 id="the-preamble">The Preamble</h3>
<p>In the preamble, there is a basic set of information that must be included in
order to define the document. The real minimum set is just the “documentclass”
parameter. Options include “article,” “book,” and “letter.” Options concerning
the paper format and the font can be specified in the square brackets while the
documentclass type should be listed in the</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">\documentclass</span><span class="na">[11pt]</span><span class="p">{</span>article<span class="p">}</span>
</code></pre></div></div>
<p>inclusion of any packages that you rely on. Standard packages include
“amsmath,” “amsfonts,” “amssymb,” and graphicx.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>\usepackage{amsmath}
\usepackage{amssymb}
</code></pre></div></div>
<p>If you are expecting a title to appear, parameters such as author and title
should be filled in.</p>
<h3 id="begin-and-end">begin and end</h3>
<p>You must begin and end the document.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">\documentclass</span><span class="na">[11pt]</span><span class="p">{</span>article<span class="p">}</span>
<span class="nt">\begin{document}</span>
<stuff>
<span class="nt">\end{document}</span>
</code></pre></div></div>
<p>Now, that’s it. To create a beautiful pdf, you can place this text in a file
called doc.tex, type “latex doc.tex” to create a dvi file, then type dvi2pdf to
create a pdf file.</p>
<h3 id="the-title-elements">The Title Elements</h3>
<p>There are elements that help to define the title elements.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">\documentclass</span><span class="na">[11pt]</span><span class="p">{</span>article<span class="p">}</span>
<span class="k">\author</span><span class="p">{</span>The Hacker Within<span class="p">}</span>
<span class="k">\title</span><span class="p">{</span>Our New Document<span class="p">}</span>
<span class="nt">\begin{document}</span>
<stuff>
<span class="nt">\end{document}</span>
</code></pre></div></div>
<p>Those variables are used by the maketitle command, which must be executed
within the document boundaries.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">\documentclass</span><span class="na">[11pt]</span><span class="p">{</span>article<span class="p">}</span>
<span class="k">\author</span><span class="p">{</span>The Hacker Within<span class="p">}</span>
<span class="k">\title</span><span class="p">{</span>Our New Document<span class="p">}</span>
<span class="nt">\begin{document}</span>
<span class="k">\maketitle</span>
<span class="nt">\end{document}</span>
</code></pre></div></div>
<h3 id="books-chapters-sections-subsections-subsubsections-and-paragraphs">Books, Chapters, Sections, Subsections, Subsubsections, and Paragraphs</h3>
<p>These are enviroments that define the hierarchy of your document.</p>
<h3 id="include-and-input">Include and input</h3>
<p>Rather than keep everything in one big file, you can include and input other
latex files into a master. That acknowledgements section that you use in every
paper? Keep it in its own file.</p>
<h3 id="examples">Examples</h3>
<p>As we go along, you may consider cloning :</p>
<ul>
<li><a href="https://github.com/physics-codes/examples/tree/master/tex" title="texamples">this repository of examples</a>.</li>
<li>or <a href="https://github.com/thehackerwithin/berkeley/tree/master/LaTeX" title="latex resources">this repository of examples</a>.</li>
</ul>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
What To Learn and Teach - All2016-01-27T00:00:00+00:00https://BIDS.github.io/dats/posts/learn-teach-spring-2016<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. I hope you’ll join us!</li>
</ul>
<p>If you can’t join us, but would like to request to learn or teach a topic
related to scientific computing, please fill out
<a href="https://goo.gl/Wf9ar2">this google form</a>.</p>
<h2 id="discussion-what-do-you-want-to-learn-and-what-can-you-teach">Discussion: What Do You Want To Learn and What Can You Teach</h2>
<p>Our first meeting of the semester will be focused on introductions and building
this semester’s schedule of topics. To mold the upcoming schedule of topics to
your needs and desires, please attend. We will engage in a fun democratic
exercise in which we each offer and request knowledge. In this way, we’ll keep THW relevant by
weighing in on what topics are important to us as a community.
To
request particular sessions, volunteer some useful knowledge, or just hang out,
please join us at 4:00pm in Room 190 of Doe Library.</p>
<p><strong>This semester, we’re going to try to have a visualization theme.</strong> Everyone
visualizes results of some kind. So, bring us your tools, your examples, your
demos, and your problems. Bring us your plots, timeseries, volumetric images,
videos, or interactive charts and graphs. We’re ready to see it all.</p>
<h2 id="first-time-attendees">First Time Attendees</h2>
<p>We are very hopeful that many new faces will join us this semester. We would
especially love your input at this meeting. Your voice will help us to make The
Hacker Within as useful and peer-driven as possible.</p>
<p>More information on the how, when, where, and why of this meeting can be found
at:</p>
<ul>
<li><a href="http://thehackerwithin.github.io/berkeley/" title="The About Page">the THW@UCB about page</a></li>
<li>and <a href="http://bids.berkeley.edu/events/hacker-within">the BIDS event page for this meeting</a></li>
</ul>
<h2 id="results">Results</h2>
<p>I will list the results here when the meeting is over.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<ul>
<li><++></li>
</ul>
<h2 id="-speaker-"><+ speaker +></h2>
<p><+ speaker bio +></p>
<h2 id="topic-"><+topic +></h2>
<p><+ notes +></p>
High Performance Python - Chick Markley2015-12-02T00:00:00+00:00https://BIDS.github.io/dats/posts/high-perf-python-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="chick-markley">Chick Markley</h2>
<p>Chick Markley does work with the Aspire lab at UC Berkeley.</p>
<h2 id="straw-man-high-performance-python-example">Straw Man High Performance Python Example</h2>
<p>First, some aphorisms:</p>
<ul>
<li>Programmer hours are more important than cpu hours - cook</li>
<li>Premature optimization is the root of all evil - Knuth</li>
<li>etc.</li>
</ul>
<p>Next, an example of a laplacian.</p>
<p>Chick put his arrays into various data structures (lists, numpy arrays, etc.)</p>
<p>Interestingly, lists performed better than naive numpy arrays, but then once
you vectorize the numpy arrays, that helps a lot and is much much faster. It’s
of course much much better if you use the built in scipy laplacian (faster
because it’s written in c). You can do well with cython too, but ultimately,
you get a lot better performance by loading a c library.</p>
<p>We can also parallelize. Parallel operations vary from embarassingly parallel
to inscrutably parallel. One can do so on many devices (many noces, MIC, GPU…),
many frameworks (pyspark, openmp, opencl, cuda…). But, once must inform the
compiler which loops to parallelize, etc.</p>
<p>One can also “roofline” one’s system with “shocdriver” or a similar tool to
benchmark the system. In particular, it shows what kind of performance
constraints are characteristic of your system.</p>
<p>Another option is SEJITS, a framework that Chick works on. It selectively
embeds just in time “specialization” (or, rather, optimization).</p>
<p>Tuning is another option. There’s something called OPENTUNER. It will run your
program numerous times to find the minimum amount of time to run the program.</p>
<p>Wait - there’s more hardware. One can build new hardware to solve your problem.
Hardware isn’t so hard anymore (maybe it should be called easyware.)</p>
<p>There’s an interesting “hardware construction language” that folks at Aspire
came up with. It’s called <a href="https://chisel.eecs.berkeley.edu/chisel-dac2012.pdf">Chisel</a>.</p>
No Meeting - Thanksgiving2015-11-25T00:00:00+00:00https://BIDS.github.io/dats/posts/thanksgiving<h2 id="attending">Attending</h2>
<p>Please don’t attend. The library is closed for Thanksgiving.</p>
scikit-learn - Ross Barnowski and Shannon McCurdy2015-11-18T00:00:00+00:00https://BIDS.github.io/dats/posts/scikit-learn-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="ross-barnowski">Ross Barnowski</h2>
<p>Ross is a Nuclear Engineering PhD student in Kai Vetter’s group.</p>
<h2 id="shannon-mccurdy">Shannon McCurdy</h2>
<p>Shannon is a postdoc in computational biology.</p>
<h2 id="discussion-scikit-learn">Discussion: scikit-learn</h2>
<p><strong>Ross</strong> walked us through a demo notebook which can be found
<a href="https://github.ocm/thehackerwithin/berkeley/tree/master/sklearn/sklearn_intro.ipynb">here</a>.
You can clone it from github.com/thehackerwithin/berkeley.</p>
<p><strong>Shannon</strong> walked us through some useful resources. The documentation for sklearn
seems to parallel a book called <a href="statweb.stanford.edu/~tibs/ElemStatLearn">The Elements of Statistical
Learning</a>, and Shannon recommends
this as a resource.</p>
<h3 id="linear-regression">Linear Regression</h3>
<p>If y is nx1 and x i nxp, we have an unknown coefficient matrix W, which is px1.
The error term is then nx1. The assumption is that x and y are linearly
related. The fit, W, minimizes the vertical error. The least squares cost
function, which comes up in regression in this way, is a model for the error.</p>
<p>Note that in this example, when p>n, we enter a danger zone for validity of
this model. Shannon wanted us to note, in this context, scikit-learn doesn’t
necessarily warn you when this happens. So, don’t trust that scikit-learn will
always warn you if you aren’t using the models in the appropriate regime.</p>
<h3 id="shrinkage-models">Shrinkage Models</h3>
<p>A bunch of different shrinkage models are included in scikit-learn. One that
Shannon uses in her work is Lasso.</p>
<p>The idea, functionally, is that we add a penalty to the least squares cost
function. The penalty is related to the magnitude of each coefficient. That is,
if you are going to add some nonzero element in the matrix, it must contribute
well to the fit with y. This is a parsimony metric which enforces sparsity in
the solution vector. This helps with interpretability because it emphasizes
the most important coefficients.</p>
<h3 id="in-the-wild">In the Wild</h3>
<p>Shannon has encountered least squares and lasso in two different problems in
her work.</p>
<p>Example: In her research she looks into event times, where only a subset
(half) of the events are recorded. Using an exponential probability and an
indicator (whether or not an event was recorded), she can describe the
probability of an event happening. Given this, she can separate the probability
into a maximum likelihood problem which can be minimized (using exponential
regression) to determine the least squares soluation and she can reframe the
Newton-Raphson step into an ordinary least squares lasso situation. If you
didn’t follow this completely, check out <a href="http://statweb.stanford.edu/~tibs/lasso.html">Tibshirani’s website on the general
topic of lasso models</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
No Meeting - Veterans' Day2015-11-11T00:00:00+00:00https://BIDS.github.io/dats/posts/veterans-day<h2 id="attending">Attending</h2>
<p>Please don’t attend. The library will be closed on November 11th.</p>
scikit-image - Stefan van der Walt2015-11-04T00:00:00+00:00https://BIDS.github.io/dats/posts/scikit-image-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="stefan-van-der-walt">Stefan van der Walt</h2>
<p>Bio</p>
<h2 id="discussion-topic-description">Discussion: Topic Description</h2>
<p>Please insert your topic description here. <strong>Bold</strong> text, <em>italic</em> text,
<a href="www.google.com">hyperlinks</a>, and other markup follow markdown syntax.</p>
<p>Please place any tutorial materials in the
<a href="https://github.com/thehackerwithin/berkeley/tree/master">master branch of this repository</a>
and link to them from this post
<a href="https://github.com/thehackerwithin/berkeley/tree/master/IPython">like so</a>.
For help
and questions, please
<a href="https://github.com/thehackerwithin/berkeley/issues/new">file an issue</a>
or email Katy.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
Advanced Python - Sven Chilton, Matthias Bussonnier2015-10-28T00:00:00+00:00https://BIDS.github.io/dats/posts/adv-python-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="sven-chilton">Sven Chilton</h2>
<p>Bio</p>
<h2 id="matthias-bussonnier">Matthias Bussonnier</h2>
<p>Post Doc in BIDS, mostly wotking on Jupyter and IPython.</p>
<h2 id="discussion-topic-description">Discussion: Topic Description</h2>
<p>WE’ll discuss a bit on advance Python, context manager, dunder methods, and a lot of things that might not be good idea to do in production but are fun to play with.</p>
<p>If tiem permit a little bit of AST.</p>
<p><a href="https://gist.github.com/Carreau/b8ed0853ab93a1943319">here</a> is the notebook I used for the various example.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
GPUs and Parallelization - Biye Jiang, Aaron Culich2015-10-21T00:00:00+00:00https://BIDS.github.io/dats/posts/gpus-parallelization-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="biye-jiang">Biye Jiang</h2>
<p><a href="https://byeah.github.io/">Biye Jiang</a> is a PhD student at UC Berkeley in the
CS department working with John Canny.</p>
<h2 id="aaron-culich">Aaron Culich</h2>
<p>Aaron is a research computing architect at Berkeley.</p>
<h2 id="discussion-gpus-and-parallelization">Discussion: GPUs and Parallelization</h2>
<p>Today’s topic is about GPUs and parallelism.</p>
<h3 id="survey-of-needs-and-resources--aaron-culich">Survey of Needs and Resources – Aaron Culich</h3>
<p>Aaron referenced a presentation on this topic. It can be found
<a href="http://parlab.eecs.berkeley.edu/sites/all/parlab/files/BootCamp_Computational_Patterns_Demmel_final_12v2.pdf">here</a>.</p>
<p>Aaron started this presentation with a survey of what the attendees are
actually using.</p>
<ul>
<li>GPUs? 3 folks.</li>
<li>Other Parallelization? Lots of folks.</li>
</ul>
<h3 id="python-parallelism">Python Parallelism</h3>
<p>It was mentioned that, for some folks, python is the language of choice. The
Python Multiprocessing module was mentioned. This was the topic of a THW
session last year. The THW resources on this topic can be found
<a href="https://github.com/thehackerwithin/berkeley/blob/master/python_concurrency">here</a>.
That session was not on GPUs, however, the python threading module can be used
in conjuction with PyCUDA, a python module for GPUs.</p>
<h3 id="research-it--krishna-muriki">Research IT – Krishna Muriki</h3>
<p><a href="research-it.berkeley.edu">Research IT</a> is available as a resource for
individuals who would like to test their code on GPU resources. Krishna
Muriki expresses that there is an institutional shared linux cluster (Savio).
Within that cluster, there are 6 compute nodes with 4 kepler GPUs each.
Those nodes are in testing and BRC is interested and open to new users.</p>
<h3 id="java-runtime-engine--oliver">Java runtime engine – Oliver</h3>
<p>Oliver at ESPM has a javascript modeling project for agent based population
models. They are working to make their software scalable from the desktop to
the level of higher performance computing. The NOVA stack and
<a href="https://www.xsede.org/">XSEDE</a> resources
are core to their efforts.</p>
<h3 id="scala-demo--biye-jiang">Scala Demo – Biye Jiang</h3>
<p>Biye demonstrated the speed of GPUs by conducting a matrix multiplication using
GPUs versus conducting the same multiplication using CPUs.</p>
<h3 id="gpu-discussion">GPU Discussion</h3>
<p>Biye shared some of the diagrams from <a href="http://on-demand.gputechconf.com/gtc/2014/presentations/S4811-extreme-machine-learning-with-gpus.pdf">this
presentation</a>.</p>
<p>He noted</p>
<ul>
<li>GPUs give excellent speed,</li>
<li>but GPU memory latency is also an issue.</li>
<li>So the throughput is high, but so is the memory latency.</li>
<li>If you want your GPU code to run quickly, optimize for throughput.</li>
<li>Always remember, GPU memory access is slower than computation.</li>
<li>Moving data between the GPU and the main memory should be avoided.</li>
</ul>
<h3 id="gpu-bidmat-demo">GPU BIDMat demo</h3>
<p>Biye presented an ipython notebook to demostrat how BIDMat works.</p>
<p>The ipython notebook demos are <a href="https://github.com/BIDData/BIDMach/blob/master/tutorials/">here</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
Webscraping - John Bohannon, Sven Chilton2015-10-14T00:00:00+00:00https://BIDS.github.io/dats/posts/webscraping-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="john-bohannon">John Bohannon</h2>
<p>Bio</p>
<h2 id="sven-chilton">Sven Chilton</h2>
<p>Bio</p>
<h2 id="discussion-topic-description">Discussion: Topic Description</h2>
<p>Please insert your topic description here. <strong>Bold</strong> text, <em>italic</em> text,
<a href="www.google.com">hyperlinks</a>, and other markup follow markdown syntax.</p>
<p>Please place any tutorial materials in the
<a href="https://github.com/thehackerwithin/berkeley/tree/master">master branch of this repository</a>
and link to them from this post
<a href="https://github.com/thehackerwithin/berkeley/tree/master/IPython">like so</a>.
For help
and questions, please
<a href="https://github.com/thehackerwithin/berkeley/issues/new">file an issue</a>
or email Katy.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
Free-Form Hacking2015-10-07T00:00:00+00:00https://BIDS.github.io/dats/posts/free-hacking-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<p>Many folks will be absent, due to a BIDS-related event elsewhere. However, you
are welcome to gather, sit together, and get some work done in a collaborative
environment.</p>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: Today is unstructured. Simply gather, sit together, and get some work done.</li>
</ul>
Pandas - Sean Wahl & Sven Chilton2015-09-30T00:00:00+00:00https://BIDS.github.io/dats/posts/pandas-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="sean-wahl">Sean Wahl</h2>
<p>Bio</p>
<h2 id="sven-chilton">Sven Chilton</h2>
<p>Bio</p>
<h2 id="discussion-topic-description">Discussion: Topic Description</h2>
<p>Please insert your topic description here. <strong>Bold</strong> text, <em>italic</em> text,
<a href="www.google.com">hyperlinks</a>, and other markup follow markdown syntax.</p>
<p>Please place any tutorial materials in the
<a href="https://github.com/thehackerwithin/berkeley/tree/master">master branch of this repository</a>
and link to them from this post
<a href="https://github.com/thehackerwithin/berkeley/tree/master/IPython">like so</a>.
For help
and questions, please
<a href="https://github.com/thehackerwithin/berkeley/issues/new">file an issue</a>
or email Katy.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
Spark and Hadoop - Zhao Zhang2015-09-23T00:00:00+00:00https://BIDS.github.io/dats/posts/spark-hadoop-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="zhao-zhang">Zhao Zhang</h2>
<p>Bio</p>
<h2 id="discussion-topic-description">Discussion: Topic Description</h2>
<p>Please insert your topic description here. <strong>Bold</strong> text, <em>italic</em> text,
<a href="www.google.com">hyperlinks</a>, and other markup follow markdown syntax.</p>
<p>Please place any tutorial materials in the
<a href="https://github.com/thehackerwithin/berkeley/tree/master">master branch of this repository</a>
and link to them from this post
<a href="https://github.com/thehackerwithin/berkeley/tree/master/IPython">like so</a>.
For help
and questions, please
<a href="https://github.com/thehackerwithin/berkeley/issues/new">file an issue</a>
or email Katy.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
Visualization - John Naulty, Ross Barnowski, Biye Jiang, Jennifer Jones2015-09-16T00:00:00+00:00https://BIDS.github.io/dats/posts/visualization-demos-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="john-naulty">John Naulty</h2>
<p>Bio</p>
<h2 id="ross-barnowski">Ross Barnowski</h2>
<p>… likes computers</p>
<h3 id="pyqtgraph">pyqtgraph</h3>
<p>Install: <code class="highlighter-rouge">pip install pyqtgraph</code></p>
<p>Demo: <code class="highlighter-rouge">python -m pyqtgraph.examples</code></p>
<p><a href="http://www.pyqtgraph.org/">Description of pyqtgraph</a></p>
<p>My Take: pyqtgraph is less user-friendly than matplotlib (esp. the
documentation; the gallery contains far fewer examples and doesn’t do a good
job of covering all of the possible features and uses of pyqtgraph), but is very
feature-rich and more performance-oriented, despite still being pure python.
There are several scenarios in which pyqtgraph is definitely worth looking into:</p>
<ul>
<li><strong>The need for speed</strong>: pyqtgraph is in many cases <em>much</em> faster than
matplotlib (see demo). Also has built-in support for remote plot updating.</li>
<li><strong>Volumetric rendering</strong>: If you need to visualize in 3D, pyqtgraph has a lot
to offer. The other de-facto python 3D-visualization library is <code class="highlighter-rouge">mayavi</code> —
I would say pyqtgraph has a slightly steeper learning curve and is a little
less pretty, but again is much faster than mayavi. I don’t have enough
experience with <code class="highlighter-rouge">yt</code> to say how it compares.</li>
<li><strong>Building Qt Applications</strong>: If you’re using python-ized Qt (either PySide
or PyQt) to build a GUI, pyqtgraph integrates very nicely. It is built with
the same tools!</li>
<li><strong>Beyond Visualization</strong>: The author(s) of pyqtgraph had the goal of making
it a general science/engineering tool. There are a lot of built-in features
designed to aid in analyzing data visually and interactively. See the
Data Slicing and Image Analysis examples to get a feel for this.</li>
</ul>
<h2 id="jennifer-jones">Jennifer Jones</h2>
<p>This is my Bio</p>
<h2 id="biye-jiang"><a href="http://byeah.github.io/">Biye Jiang</a></h2>
<p><a href="http://byeah.github.io/">I</a> am a third year CS PhD at Cal, working with Prof. <a href="http://www.eecs.berkeley.edu/~jfc/">John Canny</a>,
on topics like making machine learing more easier to use. Checkout our <a href="http://bid2.berkeley.edu/bid-data-project/">BIDMach</a> project.</p>
<p><a href="https://www.dropbox.com/s/c30gyw7p88rikkf/viz.ipynb?dl=0">Here</a> is the ipython notebook I will use in the talk.
This will be similar to our data science <a href="https://bcourses.berkeley.edu/courses/1377158/">class</a>.</p>
<h2 id="discussion-topic-description">Discussion: Topic Description</h2>
<p>Please insert your topic description here. <strong>Bold</strong> text, <em>italic</em> text,
<a href="www.google.com">hyperlinks</a>, and other markup follow markdown syntax.</p>
<p>Please place any tutorial materials in the
<a href="https://github.com/thehackerwithin/berkeley/tree/master">master branch of this repository</a>
and link to them from this post
<a href="https://github.com/thehackerwithin/berkeley/tree/master/IPython">like so</a>.
For help
and questions, please
<a href="https://github.com/thehackerwithin/berkeley/issues/new">file an issue</a>
or email Katy.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
Advanced Git and GitHub - Ross Barnowski, Kyle Barbary, Katy Huff2015-09-09T00:00:00+00:00https://BIDS.github.io/dats/posts/advanced-git-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h2 id="ross-barnowski">Ross Barnowski</h2>
<p>Ross is a graduate student in Kai Vetter’s group in Nuclear Engineering. He has
long hair.</p>
<h2 id="kyle-barbary">Kyle Barbary</h2>
<p>Kyle is a cosmologist and BIDS data science fellow. Kyle likes bicycles.</p>
<h2 id="katy-huff">Katy Huff</h2>
<p>Katy is a nuclear engineer and BIDS data science fellow.</p>
<h2 id="discussion-advanced-git">Discussion: Advanced Git</h2>
<p>We’ll be talking about a bunch of cool git stuff. This will range from powerful
hacks everyone can use to awkward workarounds only a couple of people will ever
use.</p>
<h3 id="undoing-stuff">Undoing Stuff</h3>
<ul>
<li>git reset hard vs soft</li>
<li>revert, why to revert, how to revert</li>
<li>git stash and git stash pop</li>
<li>
<p>getting a specific file from checkout</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout <branch> -- <file>
</code></pre></div> </div>
</li>
</ul>
<h3 id="useful-configurations-and-stuff">Useful Configurations and Stuff</h3>
<ul>
<li>show your current branch in the terminal prompt</li>
<li>aliasing (very quick example with git config –global alias.unstage ‘reset HEAD –’)</li>
<li>the <a href="https://github.com/github/hub">hub</a> project to make interacting with github a little nicer (follows aliases nicely)</li>
<li>
<p>Creating a template for git commit messages with</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git config (git config --global commit.template ~/.gitmessage.txt)
</code></pre></div> </div>
</li>
<li>the mailmap, for normalizing the many possible commit names of your various contributors</li>
</ul>
<h3 id="dealing-with-branches-remotes-and-collaboration">Dealing with Branches, Remotes, and Collaboration</h3>
<ul>
<li>remotes</li>
<li>setting up SSH keys</li>
<li>the DAG</li>
<li><a href="http://nvie.com/posts/a-successful-git-branching-model/">git flow</a> for collaborating</li>
<li>git tagging</li>
</ul>
<h3 id="rebasing">Rebasing</h3>
<ul>
<li>rebasing</li>
</ul>
<h3 id="specialized-knowledge">Specialized Knowledge</h3>
<ul>
<li>cherry-picking a commit from one branch to another</li>
<li>detaching a single subdirectory and its history from a big repo to make it its own repo</li>
<li>the github api: futz with github from the command line</li>
</ul>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Additionally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="name--topic">Name : Topic</h3>
<p>Notes and links</p>
<h3 id="name--topic-1">Name : Topic</h3>
<p>Notes and links</p>
<h2 id="hacky-hour">Hacky Hour</h2>
<p>Inspired by the hackers of
<a href="http://thehackerwithin.github.io/swinburne/">Australia</a>, we’re taking this
opportunity to try out a Hacky Hour. After the meeting is over, folks can stick
around to review one another’s code. This part of the meeting is meant to be
very casual, so feel free to pop open a beverage if you need to take the edge
off of the code reviews (byo).</p>
Introductory Git and GitHub - Harrison Dekker and John Naulty Jr.2015-09-02T00:00:00+00:00https://BIDS.github.io/dats/posts/git-intro-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. We hope you’ll join us!</li>
</ul>
<h2 id="meeting-info">Meeting Info</h2>
<ul>
<li>When: 4:00pm - 5:30pm</li>
<li>Where: <a href="https://bids.berkeley.edu">BIDS, Room 190 of Doe Library</a>.</li>
<li>Who: Anyone interested in software development best practices is welcome to come to our meetings.</li>
<li>How: A predetermined main topic (45 minutes) will be followed by impromptu lightning talks (5 minutes each)</li>
</ul>
<h3 id="harrison-dekker">Harrison Dekker</h3>
<p>Harrison Dekker is the director of the Data Lab, an essential student resource
on campus for data-related inquiry.</p>
<h3 id="john-naulty">John Naulty</h3>
<p>A Berkeley alum with experience in neuroscience and devices.</p>
<h2 id="introduction-to-git-and-github">Introduction to Git and GitHub</h2>
<p>Welcome to Git!</p>
<p>We will be using these resources:</p>
<ul>
<li><a href="https://try.github.io/levels/1/challenges/1">Try Git</a> is a live demo we will be going through first.</li>
<li><a href="https://training.github.com/kit/downloads/github-git-cheat-sheet.pdf">Git Cheatsheet</a> is a useful reference.</li>
<li><a href="https://git-scm.com/download/">Download Git</a> Training wheels are off, lets get started!</li>
<li><a href="https://guides.github.com/introduction/flow/index.html">Git Workflow</a>. This is a model for a typical workflow using Git.</li>
</ul>
<p><strong>Challenge</strong></p>
<ul>
<li><a href="https://github.com/thehackerwithin/berkeley/blob/master/git/git-exercise.txt">Follow this link to test your newfound Git skills!</a></li>
<li><a href="https://github.com/thehackerwithin/berkeley/blob/master/git/git-solution.md">Solutions</a></li>
</ul>
<p>Other sources not covered today:</p>
<ul>
<li><a href="http://scottchacon.com/2011/08/31/github-flow.html">More Git Workflow</a> Because workflow is important.</li>
<li><a href="https://github.com/thehackerwithin/berkeley/tree/master/git/partI">Katy’s Great Tutorial</a> This is Part I of II. I highly recommend it.</li>
</ul>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>Finally, there will be a time for a couple of <strong>Lightning Talks</strong>, which are
5-10 minute blasts of information about a particular topic or question of
interest to the group. This topic can be anything useful, new, or interesting
to scientists who compute. It may be some new skill you have recently picked up
in your research, a productivity tool you have recently learned to love, a
quick demo of a useful library, or anything you feel we would enjoy learning.<br />
<strong>Note</strong> that the lightning talk time is a good way to bring a question to the
group. If you have a bug you need help with, here’s the place to ask many ears
about it at once.</p>
<h3 id="aaron-culich--two-factor-auth">Aaron Culich : Two Factor Auth</h3>
<p>Two factor auth is a way to robustify password use by combining it with
hardware (like your phone).</p>
What Do You Want To Learn and What Can You Teach - Everyone2015-08-26T00:00:00+00:00https://BIDS.github.io/dats/posts/learn-and-teach-fall-2015<h2 id="attending">Attending</h2>
<ul>
<li>Anyone is welcome. I hope you’ll join us!</li>
</ul>
<p>If you can’t join us, but would like to request to learn or teach a topic
related to scientific computing, please fill out
<a href="https://docs.google.com/forms/d/1Kzb2EX-Tu-pdCOqXXjkp4zJWvVeWXukWlTKmM0i-CU8/viewform">this google form</a>.</p>
<h2 id="discussion-what-do-you-want-to-learn-and-what-can-you-teach">Discussion: What Do You Want To Learn and What Can You Teach</h2>
<p>Our first meeting of the year will be focused on introductions and building
this semester’s schedule of topics. To mold the upcoming schedule of topics to
your needs and desires, please attend. We will engage in a fun democratic
exercise in which we each offer and request knowledge. In this way, we’ll keep THW relevant by
weighing in on what topics are important to us as a community. To
request particular sessions, volunteer some useful knowledge, or just hang out,
please join us at 4:00pm in Room 190 of Doe Library.</p>
<h2 id="first-time-attendees">First Time Attendees</h2>
<p>We are very hopeful that many new faces will join us this semester. We would
especially love your input at this meeting. Your voice will help us to make The
Hacker Within as useful and peer-driven as possible.</p>
<p>More information on the how, when, where, and why of this meeting can be found
at:</p>
<ul>
<li><a href="http://thehackerwithin.github.io/berkeley/" title="The About Page">the THW@UCB about page</a></li>
<li>and <a href="http://bids.berkeley.edu/events/hacker-within">the BIDS event page for this meeting</a></li>
</ul>
<h2 id="results">Results</h2>
<ul>
<li>Many of you suggested many cool things to <a href="https://github.com/thehackerwithin/berkeley/blob/master/possible_topics/learn_and_teach.ipynb">learn and
teach.</a></li>
<li>Based on the popularity of those sessions, the tentative schedule for the
semester is
<a href="https://github.com/thehackerwithin/berkeley/blob/master/possible_topics/sched.csv">here.</a></li>
</ul>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="chris-paciorek---a-set-of-resources-online">Chris Paciorek - A set of resources online</h3>
<p>Chris pointed out an existing set of resources at <a href="">link</a></p>
<h3 id="aaron-culich---resources-on-berkeley-campus">Aaron Culich - Resources on Berkeley Campus</h3>
<p>Aaron shared some insider knowledge about resources on campus <a href="">presentation</a>.</p>
<h3 id="thomas-kluyver---new-cool-thing-in-jupyter">Thomas Kluyver - New Cool Thing In Jupyter</h3>
<p>Thomas showed us a cool new thing in Jupyter. You had to be there to see it.
Get excited, mergers of cells.</p>
<h3 id="biye---a-new-course">Biye - A New Course</h3>
<p>Biye talked about a new course on campust: Introduction to Data Science on campus
<a href="https://bcourses.berkeley.edu/courses/1267848/">link</a>.</p>
Technology For Teaching - Matthew Brett2015-05-13T00:00:00+00:00https://BIDS.github.io/dats/posts/tech-for-teaching<h2 id="attending">Attending</h2>
<p>More than 30 people attended!</p>
<h2 id="matthew-brett">Matthew Brett</h2>
<p>I (<a href="http://matthew.dynevor.org">Matthew</a>) am an aged sort-of post-doc working
at the UCB <a href="http://bic.berkeley.edu/">Brain Imaging Center</a>.</p>
<h2 id="how-to-use-how-not-to-use-the-ipython-notebook-for-teaching">How to use (how not to use) the IPython notebook for teaching</h2>
<p>I am teaching a course called <a href="http://practical-neuroimaging.github.io">practical
neuroimaging</a> at UCB.</p>
<p>The course is half-flipped, in that the students do 30 minutes of reading
before class, and spend about half of the 2 hour class time doing exercises.</p>
<p>Of course we make heavy use of the IPython notebook for the exercises, and
this has worked very well.</p>
<p>But - using IPython for tutorials and reading for the class has been much more
difficult because it does not yet fit well with static website builders like
Sphinx.</p>
<p>It is still hard to write a lot of complicated text or explanation in the
notebook because the web interface and cell structure make the environment
cumbersome compared to a good text editor.</p>
<p>Others seem to have had the same experience working with the IPython notebook
as an interactive code editor - see the very new <a href="http://blog.yhathq.com/posts/introducing-rodeo.html">rodeo project</a>.</p>
<p>Maybe, by sharing our experiences, we can help to work out some solution that
uses the IPython machinery, that is yet closer to perfection.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="jess-hamrick--nbgrader">Jess Hamrick : nbgrader</h3>
<p>Jess shared a cool tool written for creating</p>
<h3 id="stefan-van-der-walt--elegant-scipy-markdown-for-books-etc">St<script type="math/tex">\'{e}</script>fan van der Walt : Elegant Scipy, Markdown for Books, etc.</h3>
<p>Stefan and Juan Nunez-Iglesias are writing a book called “Elegant Scipy” to
collect and discuss elegant uses of and implementation within scientific
python. He shared some details about the book and showed how he is using
markdown as the native format to edit in, exportable to ipython notebooks and
html.</p>
<h3 id="sean-onullian--what-computers-cant-do-even-now-and-why">Sean ONullian : What Computers can’t do (even now) and why</h3>
<p>Sean gave some context for an upcoming conference.</p>
<h3 id="matthias-bussonnier--jupyter-sidecar">Matthias Bussonnier : Jupyter Sidecar</h3>
<p>A tool for viewing/rendering rich Jupyter kernel output in HTML.</p>
<p><a href="jupyter sidecar">https://github.com/rgbkrk/jupyter-sidecar</a></p>
<p>Also, thebe:</p>
<p><a href="thebe">https://github.com/oreillymedia/thebe</a>.</p>
Shiny - Karthik Ram2015-05-06T00:00:00+00:00https://BIDS.github.io/dats/posts/shiny<h2 id="attending">Attending</h2>
<p>About 20 folks!!</p>
<h2 id="karthik-ram">Karthik Ram</h2>
<p>Karthik is a BIDS data science fellow, programmer extraordinaire, and leader of ROpenSci.</p>
<h2 id="shiny">Shiny</h2>
<p><a href="http://shiny.rstudio.com/">Shiny</a> is an R-language package that creates web
applications to interact with analysis pipelines and visualizations.</p>
<p>Github repo with some example code: <a href="https://github.com/karthik/shiny">https://github.com/karthik/shiny</a>
The most up to date resource on Shiny: <a href="http://shiny.rstudio.com">http://shiny.rstudio.com/</a>
Also see some amazing cheatsheets here: <a href="http://www.rstudio.com/resources/cheatsheets">http://www.rstudio.com/resources/cheatsheets/</a></p>
<ul>
<li>server.R holds the behind the scenes info</li>
<li>ui.R holds the interface</li>
</ul>
<p>Best way to learn: Try building an app.
Best resource: cheatsheets rstudio.com/resources/cheatsheets</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">## ui.R</span><span class="w">
</span><span class="n">shinyUI</span><span class="p">(</span><span class="w"> </span><span class="n">fluidPage</span><span class="p">(</span><span class="w">
</span><span class="n">titlePanel</span><span class="p">(</span><span class="s2">"This is a shiny app"</span><span class="p">),</span><span class="w">
</span><span class="n">sidebarLayout</span><span class="p">(</span><span class="w">
</span><span class="n">sidebarPanel</span><span class="p">(</span><span class="w">
</span><span class="n">selectInput</span><span class="p">(</span><span class="s2">"x"</span><span class="p">,</span><span class="w"> </span><span class="s2">"x variable"</span><span class="p">,</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">iris</span><span class="p">))</span><span class="w">
</span><span class="n">selectInput</span><span class="p">(</span><span class="s2">"y"</span><span class="p">,</span><span class="w"> </span><span class="s2">"y variable"</span><span class="p">,</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">iris</span><span class="p">),</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">iris</span><span class="p">)[[</span><span class="m">2</span><span class="p">]])</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">mainPanel</span><span class="p">()</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">## server.R</span><span class="w">
</span><span class="c1">## any code that runs once on each server</span><span class="w">
</span><span class="c1">## put that code *before* the shinyServer() call</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">shinyServer</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">output</span><span class="p">){</span><span class="w">
</span><span class="n">output</span><span class="o">$</span><span class="n">gg</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="p">})</span><span class="w">
</span></code></pre></div></div>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="ryan-pavlovsky--radwatch-dosimeter">Ryan Pavlovsky : RadWatch Dosimeter</h3>
<p>Ryan showed of a cool small, cheap, touchscreen silicon PIN detectors (“radiation thermostat!”)
module and the plotly interface that they have deployed!</p>
<h3 id="katy--survey">Katy : Survey!!</h3>
<p>Please fill this out:
<a href="https://goo.gl/AIymbR">https://goo.gl/AIymbR</a></p>
<h3 id="jeroem-ooms">Jeroem Ooms</h3>
<p>MongoDB Client for R called “mongolite”.
<a href="http://cran.r-project.org/web/packages/mongolite/index.html">http://cran.r-project.org/web/packages/mongolite/index.html</a>.
Showed off some in-database aggregations, mapreducing, binning, and the like.</p>
Make - Chris Paciorek2015-04-29T00:00:00+00:00https://BIDS.github.io/dats/posts/make<h2 id="attending">Attending</h2>
<p>30 folks!</p>
<h2 id="chris-paciorek">Chris Paciorek</h2>
<p><a href="http://www.stat.berkeley.edu/~paciorek">Chris Paciorek</a> is the statistical computing consultant in the Department of Statistics at Berkeley, as well as being a researcher and lecturer in the department. His research focuses on statistical methods (often Bayesian methods) applied to environmental and public health applications. He teaches the department’s graduate-level statistical computing class, Stat 243.</p>
<h2 id="make">Make</h2>
<p>Make is a ubiquitous command line tool that can help to automate building
software and executing analysis pipelines.</p>
<p>For the material for today, please clone this Github repository: <a href="https://github.com/berkeley-scf/make-thw-2015">https://github.com/berkeley-scf/make-thw-2015</a></p>
<p>The primary document is <a href="https://github.com/berkeley-scf/make-thw-2015/blob/master/workshop.ipynb">this IPython Notebook</a></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="kelly-rowland--cmake">Kelly Rowland : CMake</h2>
<p>CMake, by Kitware is an open source way to automate the configuration and
generation of makefiles for building software in a cross platform way.</p>
<h2 id="jess-hamrick--scons">Jess Hamrick : SCons</h2>
<p>SCons is a replacement for make. Interestingly, it was the result of a Software
Carpentry code competition a very very long time ago.</p>
C++ and Object Orientation - Sven Chilton2015-04-22T00:00:00+00:00https://BIDS.github.io/dats/posts/c++-and-object-orientation<h2 id="attending">Attending</h2>
<p>About 20 folks.</p>
<h2 id="sven-chilton">Sven Chilton</h2>
<p>Dr. Sven Chilton is an alumni of the Nuclear Engineering department.</p>
<h2 id="c-and-object-orientation">C++ and Object Orientation</h2>
<p>C++ is a low-level programming language that utilizes an object-oriented
paradigm.</p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/c_plus_plus" title="Code Examples">here</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="brian-hamlin--a-benchmarking-exercise">Brian Hamlin : A Benchmarking Exercise</h2>
<p>Brian talked about a benchmarking exercise between C++ and Java within the
world of maps. It seems like Java was able to hold its own.</p>
<h2 id="sean-onuallain-limits-of-current-genetics-work">Sean ONuallain: Limits of Current Genetics Work</h2>
<p>Sean talked about the limitations in the approaches of two large genetics
projects.</p>
Microcontrollers - Anders Priest2015-04-15T00:00:00+00:00https://BIDS.github.io/dats/posts/microcontrollers<h2 id="attending">Attending</h2>
<p>About 20 people.</p>
<h2 id="anders-priest">Anders Priest</h2>
<p>Anders is a graduate student in nuclear engineering at Berkeley.</p>
<h2 id="microcontrollers">Microcontrollers</h2>
<p>Circuit boards, arduinos, Raspberry Pi’s, oh my! Microcontrollers and similar
digital devices enable you to sense and control the physical world using
nothing but your programming skills.</p>
<h3 id="notes">Notes</h3>
<p>Microcontrollers are small computers that range in size and scale. Some are more
sophisticated than others.</p>
<p>They are found in a variety of devices - cars, microwaves, remote controls, digital
clocks, etc. They are also used in industry and medicine.</p>
<p>The Arduino produces its own IDE, which is fairly simple to use. The two necessary
functions are <code class="highlighter-rouge">setup()</code> and <code class="highlighter-rouge">loop()</code>. Programs have to be written on a separate
computer, however.</p>
<p>The Raspberry Pi is somewhat more sophisticated and runs a stripped-down version
of Linux. You can do things like run Python scripts on the RPi.</p>
<p>Components to use with microcontrollers include:</p>
<ul>
<li>the usual analog suspects (wires, resistors, etc.)</li>
<li>sensors (accelerometers, thermistors, joysticks, etc.)</li>
<li>“shields” are devices to mount on Arduino microcontrollers (for ethernet, WiFi, etc.)</li>
</ul>
<p>The nuclear engineering department is working on a dosimeter network using
Raspberry Pi devices.</p>
<p>The Internet has a lot of great resources if you’re interested in working with and
learning about microcontrollers.</p>
<p>Today’s presentation was brought to us on a Raspberry Pi! Neat.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>None today.</p>
Julia - Kyle Barbary2015-04-08T00:00:00+00:00https://BIDS.github.io/dats/posts/julia<h2 id="attending">Attending</h2>
<p>About 25 people.</p>
<h2 id="kyle-barbary">Kyle Barbary</h2>
<p>Kyle is a postdoc in the Berkeley Center for Cosmological Physics and a BIDS fellow.
Like many people, he has a <a href="http://kbarbary.github.io/">website</a>.</p>
<h2 id="julia">Julia</h2>
<p><a href="http://julialang.org">Julia</a> is a high-level language (like Python) that emphasizes performance.
Slides and Jupyter notebooks from the talk can be found in
<a href="https://github.com/kbarbary/talks/tree/master/2015-thw-julia">this Github repository</a>.</p>
<p><em>If you’re living in future and the link is broken, look in https://github.com/kbarbary/talks/</em></p>
<h3 id="notes">Notes</h3>
<p>Julia solves the two language problem, where high-level languages are easy to
program in, but they use some other low level language on the backend. Julia,
however, only uses Julia.</p>
<p>Fundamentally, Julia was created under the idea that dynamic languages don’t
need to be slow. Julia seeks to be as fast as C, dynamic as ruby, useful as
python, etc…</p>
<p>Julia is pretty fast. In some examples, very very fast. One of the cool things
it does is to compile a function only on the first time you run it. Later runs
are faster than the first.</p>
<p>The syntax seems really similar to python, except:</p>
<ul>
<li><code class="highlighter-rouge">;</code> takes you to the shell</li>
<li><code class="highlighter-rouge">?</code> takes you to help</li>
<li><code class="highlighter-rouge">backspace</code> to get out of the shell or help</li>
<li>supports unicode, like python 3</li>
<li>you can use <code class="highlighter-rouge">ipython notebook --profile julia</code> to start IJulia</li>
<li><code class="highlighter-rouge">typeof(var)</code> gives the type of the variable var</li>
<li>string interpolation is neat. You can use <code class="highlighter-rouge">$var</code> in a string and it will be
expanded.</li>
<li>Functions and loops use an explicit <code class="highlighter-rouge">end</code></li>
<li>Functions can be written like <code class="highlighter-rouge">f(x) = 2x^2 + 3x + 1</code></li>
<li>arrays can be either homogeneous or heterogeneous. If heterogeneous, the
array type is <code class="highlighter-rouge">Any</code>.</li>
<li>array type can be explicitly defined</li>
<li>one-based indexing</li>
<li>ranges are inclusive at both ends</li>
<li>the <code class="highlighter-rouge">code_native</code> function gives the machine code for any function. Pretty
sweet.</li>
</ul>
<p>Rather than importing packages, the <code class="highlighter-rouge">using</code> syntax is used and the macros in
that package are called with <code class="highlighter-rouge">@func</code></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<p>No lightning talks.</p>
R - Rochelle Terman, Daniel Turek2015-04-01T00:00:00+00:00https://BIDS.github.io/dats/posts/r<h2 id="attending">Attending</h2>
<ul>
<li>about 25</li>
</ul>
<h2 id="rochelle-terman">Rochelle Terman</h2>
<p><a href="http://rochelleterman.com/">Rochelle</a> is a Ph.D. Candidate in Political Science at the University of California, Berkeley.</p>
<h2 id="daniel-turek">Daniel Turek</h2>
<p>Daniel Turek is a statistician and BIDS fellow.</p>
<h2 id="r">R</h2>
<p>R is a high-level programming language for statistical analysis.</p>
<p><+ notes +></p>
<p>Rochelle’s demonstration code and notes can be found in this <a href="https://github.com/rochelleterman/R-hacker-within">github repo</a></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Computer Architectures - Alex Chong2015-03-18T00:00:00+00:00https://BIDS.github.io/dats/posts/computer-architectures<h2 id="attending">Attending</h2>
<p>Lots of folks. Wasn’t able to count.</p>
<h2 id="alex-chong">Alex Chong</h2>
<p>Alex is a student at Berkeley.</p>
<h2 id="computer-architectures">Computer Architectures</h2>
<p>His talk about computer architecture can be found
<a href="http://thehackerwithin.github.com/berkeley/images/2015.03.18-architecture.pdf">here</a>.</p>
Testing - Rachel Slaybaugh2015-03-11T00:00:00+00:00https://BIDS.github.io/dats/posts/testing<h2 id="attending">Attending</h2>
<ul>
<li>30 or so folks</li>
</ul>
<h2 id="rachel-slaybaugh">Rachel Slaybaugh</h2>
<p>Rachel Slaybaugh is an Assistant Professor of Nuclear Engineering at the
University of California, Berkeley. At Berkeley, Prof. Slaybaugh’s research
program is based in computational methods and applied to existing and advanced
nuclear reactors, nuclear non-proliferation and security, and shielding
applications. She received a BS in Nuclear Engineering from Penn State in 2006
where she served as a licensed nuclear reactor operator. Dr. Slaybaugh went on
to the University of Wisconsin – Madison to earn an MS in 2008 and a PhD in
2011 in Nuclear Engineering and Engineering Physics along with a certificate in
Energy Analysis and Policy. For her PhD she researched acceleration methods for
massively parallel deterministic neutron transport codes. Dr. Slaybaugh then
worked with hybrid (deterministic-Monte Carlo) methods for shielding
applications at Bettis Laboratory while teaching at the University of Pittsburgh
as an adjunct faculty member. Throughout her career Dr. Slaybaugh has been
engaged in software carpentry education and training; she also contributes to
the open source project <a href="http://pyne.io">PyNE</a>. Prof. Slaybaugh was awarded the
2014 American Nuclear Society Young Member Excellence Award.</p>
<h2 id="testing">Testing</h2>
<p>Today’s presentation can be found <a href="http://rachelslaybaugh.github.io/berkeley/images/2015.03.11-presentation.pdf">here</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="kelly-rowland--sometimes-the-tests-are-wrong">Kelly Rowland : Sometimes the tests are wrong</h3>
<p>But, it’s ok. We don’t need to enter an infinite recursive testing of tests.
Just keep in mind that sometimes tests need to be updated when the code
interface changes behavior.</p>
<h3 id="katy-huff--travisci">Katy Huff : TravisCI</h3>
<p>Check out this continuous integration service. <a href="http://travis-ci.org">TravisCI</a> is free.</p>
<h3 id="brian-hamlin--more-travisci">Brian Hamlin : More TravisCI</h3>
<p>Brian gives an example of a travis.yml file.</p>
Matplotlib and Seaborn - Caroline Sofiatti and Sean Wahl2015-03-04T00:00:00+00:00https://BIDS.github.io/dats/posts/matplotlib-and-seaborn<h2 id="attending">Attending</h2>
<p>At least 35 people attended!</p>
<h2 id="caroline-sofiatti">Caroline Sofiatti</h2>
<p>I’m a PhD Candidate in the physics department. I work for the Supernova Cosmology Group and our goal
is to unravel the mysteries of Dark Energy, one data point at a time!</p>
<h2 id="sean-wahl">Sean Wahl</h2>
<p>PhD Candidate in the Earth and planetary science department. I study planetary interiors
using first-principles material simulations. I use matplotlib for both routine plotting
needs as well as for published journal figures.</p>
<h2 id="matplotlib">Matplotlib</h2>
<p>The find the ipython notebook <a href="https://github.com/smwahl/thw_matplotlib_presentation" title="Matplotlib Demonstration">here</a>.</p>
<p>If you wish to follow along with the presentation you should have Python 2 installed with the following packages:</p>
<p>matplotlib, numpy, ipython, basemap(optional)</p>
<h2 id="seaborn">Seaborn</h2>
<p><a href="https://web.stanford.edu/~mwaskom/software/seaborn/index.html">Seaborn</a> is an awesome library for making beautiful and informative graphics in Python. Its mission
is to make visualization a central part of exploring and understanding data. Adding <code class="highlighter-rouge">import seaborn</code> to
your code will not only make your plots look amazing, it will also make your life easier!!!</p>
<p>Check out the IPython Notebook <a href="https://github.com/sofiatti/my_thw_presentation">here</a>.</p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/topic" title="Code Examples">here</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="sean-onuallain--homoiconicity-in-programming-languages">Sean O’Nuallain : Homoiconicity in Programming Languages</h3>
<p><a href="https://www.questia.com/library/journal/1G1-382085554/symbolic-and-cognitive-theory-in-biology">See here for more.</a></p>
IPython - Omoju Miller2015-02-25T00:00:00+00:00https://BIDS.github.io/dats/posts/ipython<h2 id="attending">Attending</h2>
<p>I counted 35 people. These included, at least:</p>
<ul>
<li>Omoju</li>
<li>Kelly</li>
<li>Katy</li>
<li>John</li>
<li>Chris</li>
<li>Caroline</li>
<li>Min</li>
<li>Matthias</li>
<li>Thomas</li>
<li>Jess</li>
<li>Denia</li>
<li>Sven</li>
<li>Anders</li>
<li>Donny</li>
<li>Dan</li>
<li>Many others!</li>
<li>Add your name above if you aren’t on the list!</li>
</ul>
<h2 id="omoju-miller">Omoju Miller</h2>
<p><a href="http://omojumiller.com">Omoju Miller</a> is a PhD candidate at the University of
California at Berkeley researching artificial intelligence. She is also a
software technologist, start-up advisor, and educator.</p>
<h2 id="ipython">IPython</h2>
<p>IPython is an interactive interpreter for programming with Python (and now many
other languages).</p>
<h3 id="easy-peasy-lemon-squeezy">Easy, Peasy, Lemon Squeezy</h3>
<p>Omoju suggests that, to work, teach, or collaborate, development tools need to
be as easy as possible to install and use.</p>
<p>Things that she mentioned in this regard:</p>
<ul>
<li>IPython is easy to install with “pip.” Just type <code class="highlighter-rouge">pip install ipython</code> in the
terminal.</li>
<li><a href="http://wakari.io">Wakari.io</a></li>
<li><a href="http://nbviewer.ipython.org">NBViewer</a></li>
</ul>
<h3 id="the-ipython-notebook">The IPython Notebook</h3>
<p>To start up the ipython notebook, crack open your terminal and type:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ipython notebook
</code></pre></div></div>
<p>That starts up a server which serves ipython notebookes (usually to
localhost:8888 or similar). This command, therefore, will automatically open a
browser instance with a view of your directory. This will allow you to open up
any ipython notebooks in that directory. It also allows you (with a button) to
create a new notebook in that directory.</p>
<h4 id="the-oscars">The Oscars</h4>
<p>Omoju showed an example from <a href="http://nbviewer.ipython.org/github/ptwobrussell/Mining-the-Social-Web-2nd-Edition/tree/master/ipynb/">Mining the Social
Web</a>
about using the twitter api. She demonstrated how she was able to use the
ipython notebook to access the twitter firehose and filter out tweets
concerning the Academy awards.</p>
<h4 id="latex-in-markdown-cells">LaTeX in Markdown cells</h4>
<p>She demonstrated also how to include LaTeX in a markdown cell. First, create a
markdown cell, then include math:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Courtesy of MathJax, you can include mathematical expressions both inline:
$e^{i\pi} + 1 = 0$ and displayed:
$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$
</code></pre></div></div>
<h4 id="using-ipython-notebooks-with-github">Using IPython Notebooks with GitHub</h4>
<p>A troublesome issue with IPython notebooks is the extra information that is
held in the json. That doesn’t version control as beautifully as plain text.</p>
<p>To avoid extra headaches, clear all output cells (Cell - > All Output - >
Clear) before you commit your ipython notebooks.</p>
<h4 id="magics-in-the-notebooks">Magics in the Notebooks</h4>
<p>Omoju describes these as Development Powertools.
<a href="http://ipython.org/ipython-doc/dev/interactive/tutorial.html">Magics</a> are special tools and
functions. They are often preceeded by one or more percent signs (%). Some examples:</p>
<ul>
<li>%timeit : times the execution of a function</li>
<li>%%timeit : times the execution of a whole cell</li>
<li>%%javascript : allows the use of javascript code in the notebook</li>
</ul>
<h4 id="plotly-notebook-examples">Plotly Notebook Examples</h4>
<p>Plotly is a plotting tool. On their websites, there are some fun examples in
the gallery of IPython notebooks .</p>
<h4 id="use-cases">Use Cases</h4>
<p>Omoju suggests using IPython notebooks for lots of stuff, including:</p>
<ul>
<li>Code Mentoring</li>
<li>Teaching</li>
<li>Data Analysis</li>
<li>Writing Books</li>
</ul>
<h3 id="learning-more-with-books">Learning more with Books</h3>
<p>Omoju recommends <a href="http://ipython-books.github.io/cookbook">The Ipython Cookbook</a>.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="martin-magdinier--openrefine">Martin Magdinier : OpenRefine</h3>
<p><a href="http://openrefine.org/">OpenRefine</a> is a tool for helping to clean and process
data. You can do this with very limited data processing skills, but it is also
useful for more skilled analysts.</p>
<p>Refine runs a local java-based server on your local machine and it opens up a
browser instance to provide an interface for loading data, parsing it,
identifying close duplicate categories, cleaning it up, and exploring it
somewhat with filters and views.</p>
<h3 id="brian-hamlin--geospatial-stuff">Brian Hamlin : Geospatial stuff!</h3>
<p>Geospatial data is everywhere. An example is found at
<a href="http://www.pism-docs.org">www.pism-docs.org</a></p>
<ul>
<li><strong>Does anyone know how to export NetCDF -> text</strong></li>
<li><strong>Does anyone who uses Pandas or GeoPandas know how to join-on-attribute and
get a GeoPandas object rather than a Pandas object?</strong></li>
</ul>
<p><a href="http://live.osgeo.org/en/index.html">OSGeo-Live</a></p>
<h3 id="daniel-wooten--prompt-magic">Daniel Wooten : Prompt Magic</h3>
<p>Dan Wooten uses SSH to get to computers all over the place. He likes to be able
to tell what computer he is on simply by the color of the prompt in his
terminal.</p>
<p>In the .bashrc file, you can export a modified PS1 variable to change the
content and color of your prompt.</p>
<p>BUT, the code necesary for specifying the right thing is pretty hideous. Let
someone else figure out the syntax for you with : <a href="http://www.funtoo.org/Prompt_Magic">PROMPT
MAGIC!!!</a></p>
Advanced Git - Dav Clark2015-02-18T00:00:00+00:00https://BIDS.github.io/dats/posts/git<h2 id="attending">Attending</h2>
<p>Over 40 people! Too many to count! We have arrived.</p>
<h2 id="dav-clark">Dav Clark</h2>
<p>Dav Clark is the director of <a href="http://bead.glass">Glass Bead Labs</a>, is employed by the <a href="http://dlab.berkeley.edu">D-Lab</a>, and is supported by <a href="http://bids.berkeley.edu">BIDS</a> and the <a href="http://www.nimh.nih.gov/">NIMH</a>. His mission is to provide inclusive access to Data Science training, with a particular focus on <em>social</em> scientists.</p>
<h2 id="advanced-git">Advanced Git</h2>
<p>You may be keeping track of your work with git already. Learn some minimal skills, including the answer to “what’s a pull request?” You’ll get more done, both by managing your own work efficiently, and by effectively soliciting and incorporating work from others.</p>
<p>Dav has created a repository at <a href="https://github.com/tech4measurement/tech4measurement.github.io">github.com/tech4measurement/tech4measurement.github.io</a>. This repository uses jekyll to make a website. In order to change a thing or two about the website, folks make pull requests.</p>
<p>We’ll be pretty interactive. Check out this <a href="https://github.com/dlab-berkeley/git-fundamentals">overview of resources</a>, pull requests welcome.</p>
<h3 id="notes"><+Notes+></h3>
<p><+notes here+></p>
<h3 id="lightning-talks">Lightning Talks</h3>
<h2 id="thomas-kluyver">Thomas Kluyver</h2>
<p>Thomas has created an exceptional little tool for teaching the shell!</p>
Text Editors - Everyone2015-02-11T00:00:00+00:00https://BIDS.github.io/dats/posts/text-editors<h2 id="attending">Attending</h2>
<ul>
<li>Donny</li>
<li>Mathias</li>
<li>Chris</li>
<li>Kyle</li>
<li>Sven</li>
<li>Cameron</li>
<li>Joey</li>
<li>Anders</li>
<li>Katy</li>
<li>Mathias</li>
<li>Caroline</li>
<li>Matthew</li>
<li>Sean</li>
<li>David</li>
<li>Edward</li>
<li>Others (didn’t catch your name! sorry!!)</li>
</ul>
<h2 id="everyone">Everyone</h2>
<p>This week will be a session full of lightning talks. All of the members of THW
are encouraged to bring a lightning talk introducing some aspect of their
favorite (or not their favorite) text editor.</p>
<h2 id="matthew-brett--why-invest-in-a-text-editor">Matthew Brett : Why Invest in a Text Editor?</h2>
<p>Use a single editor well. “The Pragmatic Programmer” (Andrew Hunt & DAvid Thomas). Vim/Emacs are productive if you do it well.</p>
<p>What is the cost to a scientist of being a bad programmer?</p>
<p>Maybe the good motivators are: taking it on faith, by watching others, and increasing efficiency of thought.</p>
<p>Matthew wants to do a study!!! It’s going to be cool</p>
<h2 id="joey-curtis--atom">Joey Curtis : Atom</h2>
<p>Joey shared this <a href="http://www.sitepoint.com/sitepoint-smackdown-atom-vs-brackets-vs-light-table-vs-sublime-text/">text editor smackdown</a>
blogpost with us. He’s now going to show off a few things about
<a href="https://atom.io/">Atom</a>.</p>
<p>Atom is a lot like sublimetext. Atom is GitHub’s text editor and it’s completely open source underneath, built on node.js.</p>
<p>On GitHub, there are tons of available packages to extend the program. There
are tons of papers, even, on how people prefer to look at code (colors,
appearence, eyestrain). The things that are successful are somewhat based on
Sublimetext, which, in turn, is based on Atom.</p>
<h2 id="katy-huff--vim-latex">Katy Huff : Vim-LaTeX</h2>
<p>Vi (vim) has a lot of plugins. Katy’s favorite way to discover plugins is
<a href="http://vimawesome.com">VimAwesome</a>. Her favorite way to then to install most
of those plugins is something called
<a href="https://github.com/tpope/vim-pathogen">vim-pathogen</a>.</p>
<p>Among all of these plugins, the one that has made the most difference in the
life of Katy is <a href="http://vim-latex.sourceforge.net/">vim-latex</a>. She owes this
knowledge to the great and wonderful RedBeard (@mrterry).</p>
<h2 id="donny--ipython-notebook">Donny : IPython Notebook</h2>
<p>Check it out, you get a beautiful IPython prompt, you have the ability to edit
cells with markdown, get python documentation, quickly interact with plots and
whatnot.</p>
<p>Literate programming is the name of the game here. It’s a nice way to prototype
code.</p>
<h2 id="chris-paciorek--lyx">Chris Paciorek : LyX</h2>
<p>LyX is a WYSIWYG-style LaTeX. You can do things like type “frac” and then
“space” and it shows up beautifully rendered. It avoids the intermediate step
of building the LaTeX file.</p>
<h2 id="sven-chilton--emacs">Sven Chilton : Emacs</h2>
<p>The default emacs in macOSX isn’t the best. You should install the new version.
The Ctrl-x is the key feature. You do that to execute various commands. “Ctrl-x
3” gets you a vertical screen. Lots of other things get shown off…. opening a
file.</p>
<h2 id="anders-priest--vim">Anders Priest : Vim</h2>
<p>Anders uses vim mostly in insert mode, but has recently started beefing up his
vimrc. He went to vimdoc.sourceforge.net and learned more about all the
options.</p>
<ul>
<li>Colorschemes go in the colors folder.</li>
<li>You can use the cursor in insert mode if you “set mouse=i”</li>
<li>You can show the numbers or not show the numbers.</li>
<li>You can use mapping functions. Anders did this to make it so he can delete a
line even from insert mode.</li>
</ul>
<h2 id="cameron-bates--textmate">Cameron Bates : Textmate</h2>
<p>Mac only text editor. It supports the all powerful command+ and command- view
changers.</p>
<ul>
<li>It is one of the first of the standalone text editors.</li>
<li>There aren’t many changes anymore, as it’s quite mature and stable.</li>
<li>Cameron uses it mostly for editing large files.</li>
<li>It also can let you search within a folder, rather than just in one file.</li>
<li>Search and replace, therefore, is nice and safe.</li>
</ul>
<h2 id="kyle-barbary--emacs-line-wrapping">Kyle Barbary : Emacs line-wrapping</h2>
<p>If you hit alt-q it will reflow the text to make it wrap nicely.</p>
<h2 id="caroline-sofiatti--sublimetext">Caroline Sofiatti : Sublimetext</h2>
<p>It’s beautiful and a lot like Atom. Sublime has a beautiful rendering of the
whole file.</p>
Parallel Programming - Chris Paciorek2015-02-04T00:00:00+00:00https://BIDS.github.io/dats/posts/parallel<h2 id="attending">Attending</h2>
<ul>
<li>Chris Paciorek</li>
<li>Matias (Ipython)</li>
<li>Min RK (IPython)</li>
<li>Josh Howland (NE)</li>
<li>Greg (neuro)</li>
<li>Andrew</li>
<li>Kelly Rowland (NE)</li>
<li>Sven Chilton (NE)</li>
<li>Anders Priest (NE)</li>
<li>David (econ)</li>
<li>Rachel Slaybaugh (NE)</li>
<li>Caroline Sofiatti (astro)</li>
<li>Alex (undergrad)</li>
<li>Katy Huff (NE)</li>
<li>Ryan Pavlovsky (NE)</li>
<li>Zhangpeng Guo (NE)</li>
<li>Denia Djokic (NE)</li>
<li>Tenzing Joshi (NE)</li>
<li>Xin Wang (NE)</li>
<li>Nicholas Adams (DLab)</li>
<li>Vic Gehman (physics)</li>
<li>others</li>
</ul>
<h2 id="chris-paciorek">Chris Paciorek</h2>
<p><a href="http://www.stat.berkeley.edu/~paciorek">Chris Paciorek</a> is the statistical computing consultant in the Department of Statistics at Berkeley, as well as being a researcher and lecturer in the department. His research focuses on statistical methods (often Bayesian methods) applied to environmental and public health applications. He teaches the department’s graduate-level statistical computing class, Stat 243.</p>
<h2 id="parallel-programming">Parallel Programming</h2>
<p>For the material for today, please clone this <a href="https://github.com/berkeley-scf/parallel-thw-2015">github repository</a></p>
<p>https://github.com/berkeley-scf/parallel-thw-2015</p>
<p>The primary document is <a href="https://github.com/berkeley-scf/parallel-thw-2015/blob/master/parallel.pdf">here</a></p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h3 id="rachel-slaybaugh--totalview">Rachel Slaybaugh : TotalView</h3>
<p>A debugger that works reasonably well for distributed parallel tasks is <a href="https://computing.llnl.gov/tutorials/totalview/part3.html">TotalView</a>. It’s developed by Livermore.</p>
The Shell and The Filesystem Hierarchy Standard - Katy Huff2015-01-28T00:00:00+00:00https://BIDS.github.io/dats/posts/fhs<h2 id="attending">Attending</h2>
<p>About 35 folks! No attendance was taken, though.</p>
<h2 id="katy-huff">Katy Huff</h2>
<p><a href="https://kathuff.github.io">Katy Huff</a> is a postdoc with NSSC and BIDS.</p>
<h2 id="the-shell">The Shell</h2>
<p>There was some interested in the shell. In particular, someone was interested
in ksh. So, let’s cover shells.</p>
<h3 id="various-shells">Various Shells</h3>
<p>Shell programs are just programming languages. The flavors include:</p>
<ul>
<li>sh</li>
<li>csh</li>
<li>tsh</li>
<li>zsh</li>
<li>ksh</li>
<li>bash</li>
</ul>
<p>What are the differences? Mostly syntax. For serious shell programming, they
vary mostly in the way they treat arrays, their order of operations, and their
way they treat variable scope.</p>
<h3 id="basics-in-the-shell">Basics in the Shell</h3>
<p>I’m actually going to cheat here and use the first chapter of <a href="http://physics.codes">my new
book</a> to cover the shell basics really quickly.</p>
<p>I know that’s not a very open source way to go about things, because you’d have
to buy the book to get this material later. Be cool. The same material is
<a href="http://software-carpentry.org/v5/novice/shell/index.html">covered beautifully by Software
Carpentry</a>.</p>
<h3 id="customizing-your-shell">Customizing Your Shell</h3>
<p>In any shell, there are files that can be used to customize its behavior. These
files hold bash commands that are run at the start of each shell session. For
bash these are usually:</p>
<ul>
<li>.bash_profile</li>
<li>.bashrc</li>
<li>.bash_aliases</li>
</ul>
<p>The profile file is called first and sources other files (such as bashrc and
aliases). Many people keep their bashrc files online. Let’s find some good ones
and browse them. I <a href="https://github.com/katyhuff/tools/tree/master/env">keep mine
online</a> so that I can get
back to work instantly if my laptop self-immolates. Let’s talk about some of
the things you can do to make your life easier with bashrc.</p>
<h2 id="the-filesystem">The Filesystem</h2>
<p>All of this is very exciting. The shell provides a nice transparent interface
to the filesystem. But, what’s the point of having an interface to the
filesystem?</p>
<p>Pretty much everything in a UNIX or Linux operating system is a file that you
can look at. Since you’re a human with skillz, this means that pretty much
everything in the operating system is something you can investigate,
manipulate, and control.</p>
<p>The only way to know the potential power of the filesystem is to understand the
filesystem hierarchy standard.</p>
<h3 id="the-filesystem-hierarchy-standard">The Filesystem Hierarchy Standard</h3>
<p>On a linux machine, the placement of directories at the top level of the
filesystem is not just systematic, it is standardized. The standard provides a
place for each thing that might be needed on your filesystem.</p>
<p>I feel like this kind of skill should be used in only two ways:</p>
<ul>
<li>to be more efficient</li>
<li>to prank your friends</li>
</ul>
<p>Thankfully, the filesystem provides plenty of opportunities for both.</p>
<h4 id="bin">/bin</h4>
<p>Binary files are utilities like commands and programs. System level binary
files are held in bin.</p>
<h4 id="lib">/lib</h4>
<p>Libraries are compiled software with APIs that can be used by other source code
on your system. System level libraries are held in lib. UNIX machines don’t
have lib at the top level, but they do have it at lower levels. We’ll see this
when we address opt and usr.</p>
<h4 id="dev">/dev</h4>
<p>Even hardware has a filesystem representation. In dev, block and character
devices are linked to the operating system through file-like objects. Browse
dev… what devices do you see? Can you find your printer? What is zero? What is random?</p>
<p>It used to be the case that you could pipe random numbers into the file that
held your speakers (try ‘cat /dev/random > /dev/dsp’). It isn’t true with
modern linux, unfortunately. Now all audio moves through a program (on linux
it is called aplay) before it hits the device.</p>
<p>On linux, try:</p>
<table>
<tbody>
<tr>
<td>cat /dev/urandom</td>
<td>aplay</td>
</tr>
</tbody>
</table>
<p>On macs, try:</p>
<p>say the hacker within rocks</p>
<h4 id="proc">/proc</h4>
<p>The processes on your machine are represented in the filesystem by what appear to be files.
This isn’t true on a mac. However, it’s really cool.</p>
<h4 id="boot">/boot</h4>
<p>Macs don’t have this. Linux does. What do you think it holds?
Why should this be part of the filesystem?</p>
<h4 id="mnt">/mnt</h4>
<p>This is where things get mounted (CDs, USB drives, etc.). Note that a lot of
these will also be accessible via the device number of their port. Unlike the
port, though, you can unmount things that are mounted.</p>
<h4 id="opt">/opt</h4>
<p>When you want to install a library or a program, you might want to do it in
this optional space. This space reflects the top-level system hierarchy.</p>
<h4 id="usr">/usr</h4>
<p>An almost exactly equivalent space is here in usr.</p>
<h2 id="lightning-talks">Lightning Talks</h2>
<h2 id="ryan-pavlovsy--ssh-config-files">Ryan Pavlovsy : ssh config files!</h2>
What Do You Want To Learn and What Can You Teach - Everyone2015-01-21T00:00:00+00:00https://BIDS.github.io/dats/posts/learn-and-teach<h2 id="attending">Attending</h2>
<ul>
<li>Katy Huff</li>
<li>Rachel Slaybaugh</li>
<li>Rochelle Terman</li>
<li>Caroline Sofiatti</li>
<li>Denia Djokic</li>
<li>Britta Fiore</li>
<li>Chris Paciorek</li>
<li>Alex Chong</li>
<li>Greg Telian</li>
<li>Sean Wahl</li>
<li>Min RK</li>
<li>James Kendrick</li>
<li>Sven Chilton</li>
<li>Jose Buraschi</li>
<li>Andrew Greenop</li>
<li>Joey Curtis</li>
<li>Anders Priest</li>
<li>Daniel Turek</li>
<li>Karthik Ram</li>
<li>Tenzing Joshi</li>
<li>Kelly Rowland</li>
<li>Madicken Munk</li>
<li>Thomas Kluyver</li>
<li>Kyle Barbary</li>
<li>Daniel Wooten</li>
</ul>
<h2 id="discussion-what-do-you-want-to-learn-and-what-can-you-teach">Discussion: What Do You Want To Learn and What Can You Teach</h2>
<p>Our first meeting of the year will be focused on introductions and building
this semester’s schedule of topics. To mold the upcoming schedule of topics to
your needs and desires, please attend. We will engage in a fun democratic
exercise in which we each offer and request knowledge. In this way, we’ll keep THW relevant by
weighing in on what topics are important to us as a community. To
request particular sessions, volunteer some useful knowledge, or just hang out,
please join us at 4:00pm in Room 190 of Doe Library.</p>
<h2 id="first-time-attendees">First Time Attendees</h2>
<p>We are very hopeful that many new faces will join us this semester. We would
especially love your input at this meeting. Your voice will help us to make The
Hacker Within as useful and peer-driven as possible.</p>
<p>More information on the how, when, where, and why of this meeting can be found
at:</p>
<ul>
<li><a href="http://thehackerwithin.github.io/berkeley/about.html" title="The
About Page">the THW@UCB about page</a></li>
<li>and <a href="http://bids.berkeley.edu/events/hacker-within">the BIDS event page for this meeting</a></li>
</ul>
<h2 id="results">Results</h2>
<p>You can see the results in the master branch of this repository
<a href="https://github.com/thehackerwithin/berkeley/tree/master/possible_topics">here</a>
and you can see the logic behind the scheduling <a href="nbviewer.ipython.org/github/thehackerwithin/berkeley/blob/master/possible_topics/learn_and_teach.ipynb">here in this ipython
notebook</a>.</p>
Nuclear Data and Advanced Cython - Morgan White and Cameron Bates2014-12-03T00:00:00+00:00https://BIDS.github.io/dats/posts/nuclear-data-advanced-cython<h1 id="attending">Attending</h1>
<ul>
<li>Alejandra Jolodosky</li>
<li>Denia Djokic</li>
<li>Katy Huff</li>
<li>Cameron Bates</li>
<li>Kyle Barbary</li>
<li>Kelly Rowland</li>
<li>Rachel Slaybaugh</li>
<li>Marissa Zweig</li>
<li>Ryan Pavlovsky</li>
<li>Ross Barnowski</li>
<li>Any Haefner</li>
<li>Aaron Culich</li>
<li>Krishna Muriki</li>
<li>Morgan White</li>
<li>? New beligian student</li>
</ul>
<h1 id="nuclear-data-at-lanl---morgan-white">Nuclear Data at LANL - Morgan White</h1>
<p>We have a distinguished visitor for the last meeting of the semester. Morgan
will give us some thoughts on Nuclear Data at Los Alamos.</p>
<h2 id="morgan-white">Morgan White</h2>
<p>Morgan White joined the nuclear data team at LANL in X-division in 1998 as a
summer student and has been part of that team ever since. Recently, Morgan has
crossed from simulations to the dark side and begun working with the
experimental community to better understand and reduce the systematic errors in
the fundamental data necessary for such simulations.</p>
<h1 id="advanced-cython---cameron-bates">Advanced Cython - Cameron Bates</h1>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/cy_advanced" title="Code Examples">here</a>.</p>
<h2 id="cameron-bates">Cameron Bates</h2>
<p>Cameron is a PhD candidate in Nuclear Engineering who works as a graduate
student researcher on nuclear data experiment and simulation at Lawrence
Berkeley National Laboratory.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
ORIGEN and Open Source2014-11-19T00:00:00+00:00https://BIDS.github.io/dats/posts/origen-and-open-source<h1 id="attending">Attending</h1>
<ul>
<li>Max Fratoni</li>
<li>Katy Huff</li>
<li>Kelly Rowland</li>
<li>Alejandra Jolodosky</li>
<li>Sandra Bogetic</li>
<li>Madicken Munk</li>
<li>Dan Wooten</li>
<li>Tenzing Joshi</li>
<li>Andrey Mironyuk</li>
</ul>
<h1 id="discussion-origen---max-fratoni">Discussion: ORIGEN - Max Fratoni</h1>
<p>Max gave us an overview of ORIGEN, a depletion code.
Max’s presentation can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/origen" title="Presentation">here</a>.</p>
<h2 id="max-fratoni">Max Fratoni</h2>
<p>Max is a professor in the Department of Nuclear Engineering.</p>
<h2 id="origen">ORIGEN</h2>
<ul>
<li>ORIGEN solves the bateman equation</li>
<li>What you need for the zero dimensional depletion equation to be accurate is
simple: accurate cross sections.</li>
</ul>
<p>ORIGEN-S is within the Scale package and is maintained by the Scale
maintainers, whereas ORIGEN2 is standalone. ORIGEN-ARP is a graphical interface
for ORIGEN-S. It’s possible to use 3 energy groups in ORIGEN-S and the cross
sections are kept up-to-date</p>
<ul>
<li>ORIGEN-S tracks depletion for 1946 isotopes.</li>
<li>HOWEVER, there are only about 300 isotopes in the ENDF database</li>
</ul>
<p>So, how do we run the code? Max went over the various data we need to input</p>
<ul>
<li>material</li>
<li>data</li>
<li>depletion data
<ul>
<li>power depletion : need power and time</li>
<li>flux irradiation : need flux and time</li>
<li>decay : need time</li>
</ul>
</li>
</ul>
<p>They produce:</p>
<ul>
<li>activity</li>
<li>radiotoxicity</li>
<li>decay heat</li>
<li>absorption and fission rates</li>
<li>neutron emmission</li>
<li>photon emission</li>
</ul>
<p>Every material you provide must be one of the three groups</p>
<ul>
<li>activation product (720)</li>
<li>actinide (130)</li>
<li>fission product (850)</li>
</ul>
<p>Of course these groups overlap.</p>
<p>You also have to provide information about every nuclide (decay constants,
decay heats, etc.) These <strong>decay data libraries</strong> are plaintext. ORIGEN comes
packaged with this information.</p>
<p>You also have to provide the <strong>cross section libraries</strong>. ORIGEN comes with
some of these. The cross section libraries have to be selected carefully.</p>
<p>The input files are <strong>TAPE</strong> files… because they used to actually be tapes.</p>
<h1 id="dicussion-open-source-contribution">Dicussion: Open Source Contribution</h1>
<p>We intentionally misspelled everyone’s names and went through the
issue-pull-request-review-pull-close workflow seen in many open source
projects.</p>
Cython and the Python C/API - Ross Barnowski2014-11-05T00:00:00+00:00https://BIDS.github.io/dats/posts/cython-python-c-api<h1 id="attending">Attending</h1>
<ul>
<li>Ross Barnowski</li>
<li>Andy Haefner</li>
<li>Tenzing</li>
<li>Paul</li>
<li>Kelly Rowland</li>
<li>Daniel Wooten</li>
<li>Kyle Barbary</li>
<li>Cameron Bates</li>
<li>Aaron Culich</li>
<li>Katy Huff</li>
</ul>
<h1 id="discussion-extending-python-with-cython-and-the-capi">Discussion: Extending Python with Cython and the C/API</h1>
<h2 id="ross-barnowski">Ross Barnowski</h2>
<p>Ross Barnowski is a nuclear engineering PhD student in Kai Vetter’s research
group.</p>
<h2 id="cython-and-the-python-capi">Cython and the Python C/API</h2>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/extendingPython_CAPI-cython" title="Code Examples">here</a>.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h2 id="-person----topic-"><+ person +> : <+ topic +></h2>
<h2 id="-person----topic--1"><+ person +> : <+ topic +></h2>
Jekyll - Katy Huff2014-10-29T00:00:00+00:00https://BIDS.github.io/dats/posts/jekyll<h1 id="attending">Attending</h1>
<ul>
<li>Sandra Bogetic</li>
<li>Alejandra Jolodosky</li>
<li>Staffan Qvist</li>
<li>Madicken Munk</li>
<li>Katy Huff</li>
<li>Jason Hou</li>
<li>Daniel Wooten</li>
<li>Ross Barnowski</li>
<li>Andy Haefner</li>
<li>Fatma Imamoglu</li>
<li>Rachel Slaybaugh</li>
<li>others…</li>
</ul>
<h1 id="katy-huff">Katy Huff</h1>
<p>Katy Huff is a postdoc with NSSC and BIDS.</p>
<h1 id="jekyll">Jekyll</h1>
<p>This very site is made with Jekyll. Jekyll is a Ruby-based, blog-aware, static
site generator.</p>
<h2 id="two-ways-to-host-your-jekyll-site-for-free-on-github">Two ways to host your Jekyll site for free on GitHub</h2>
<p>Everybody needs a website. Google yourself. What happens? Let’s get you a
website.</p>
<h2 id="usernamegithubcom-master-branch">username.github.com master branch</h2>
<p>Every time someone creates a user name on github, a special space on the
internet is reserved for them at theirusername.github.com (and .io, it’s a long
story).</p>
<p>If the user “lisemeitner” existed, then she could create a repository on github
called “lisemeitner.github.com” (or .io, it’s a long story). If that repository
has a master branch, then GitHub will try to <strong>render</strong> it with Jekyll and
<strong>serve</strong> it up to the internet at lisemeitner.github.io. Note that jekyll
plug-ins used by GitHub are very minimal. Try not</p>
<p>If Lise doesn’t want to use Jekyll, that’s cool. Sites on GitHub can be plain
boring old html (like <a href="http://katyhuff.github.io">katyhuff.github.io</a>. To keep GitHub from trying to render
it as jekyll, she has to add an empty file (.nojekyll) in her repository.
Additionally, an index.html file has to exist at the top level of her
repository, or else there will be nothing there.</p>
<h2 id="gh-pages-branch">gh-pages branch</h2>
<p>If Lise also has a project called fission, she can have a website for it too.
That website can sit on the internet at lisemeitner.github.io/fission. All she
has to do is put either jekyll stuff or a static html page in the gh-pages
branch. The same rules apply as far as .nojekyll and plug-ins are concerned.</p>
<p>For an example, check out
<a href="http://katyhuff.github.io/cyder">katyhuff.github.io/cyder</a>.</p>
<h2 id="how-does-the-thw-site-work">How does the THW site work?</h2>
<p>Please look at the readme. We’re gonna make some changes.</p>
<h2 id="whats-this-config-file">What’s this config file?</h2>
<p>It’s for configuring the site, silly! Let’s check it out.</p>
<h2 id="whats-all-this-stuff-at-the-top-of-the-posts">What’s all this stuff at the top of the posts?</h2>
<p>It’s YAML metadata. Let’s talk about it.</p>
<h2 id="serving-it-up-locally">Serving it up locally</h2>
<p>So, rather than rely on github to render the jekyll and serve it up on the
internet, you can also render it locally and check it out on your localhost.
You’ll need to have ruby installed. Then:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gem install jekyll
</code></pre></div></div>
<p>Then, if you navigate to a directory containing a jekyll site, you can serve it
up:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jekyll serve
</code></pre></div></div>
<p>Now open a browser and navigate to the localhost url
<a href="http://localhost:4000">http://localhost:4000</a>.</p>
<h2 id="what-about-themes">What about themes?</h2>
<p>The THW page relies on an open source theme called left. We could swap that out
for another theme really easily. There are lots on the internets. Try <a href="http://jekyllthemes.org/">this
page.</a></p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h1 id="-person----topic-"><+ person +> : <+ topic +></h1>
<h1 id="-person----topic--1"><+ person +> : <+ topic +></h1>
MocDown and Pyne Install - Phil Gorman and Kelly Rowland2014-10-22T00:00:00+00:00https://BIDS.github.io/dats/posts/mocdown-pyne<h1 id="attending">Attending</h1>
<ul>
<li>Phil Gorman</li>
<li>Phil</li>
<li>Xianlom Hou</li>
<li>Daniel Wooten</li>
<li>Alejandra Jolodosky</li>
<li>Madicken Munk</li>
<li>James Bevins</li>
<li>Kelly Rowland</li>
</ul>
<h1 id="discussion-pyne-install---kelly">Discussion: Pyne Install - Kelly</h1>
<p>Today’s THW went really well! Kelly did a “choose your own adventure” livebuild of PyNE on a guest account on her computer.</p>
<h1 id="discussion-mocdown-20---phil">Discussion: Mocdown 2.0 - Phil</h1>
<p>Phil continued his introduction to Mocdown.</p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/topic" title="Code Examples">here</a>.</p>
MocDown and Python Threading - George Zhang, Phil Gorman, Ross Barnowski2014-10-15T00:00:00+00:00https://BIDS.github.io/dats/posts/mocdown-and-threading<h1 id="attending">Attending</h1>
<ul>
<li>George Zhang</li>
<li>Phil Gorman</li>
<li>Chick Markley</li>
<li>Max Fratoni</li>
<li>Aaron Culich</li>
<li>Xiao Fan</li>
<li>Kyle Barbary</li>
<li>Madicken Munk</li>
<li>Alejandra Jolodosky</li>
<li>Denia Djokic</li>
<li>Ross Barnowski</li>
<li>Katy Huff</li>
<li>Joey Curtis</li>
<li>Kelly Rowland</li>
<li>Andy Haefner</li>
<li>Caroline</li>
</ul>
<h2 id="discussion-mocdown">Discussion: MocDown</h2>
<h3 id="george-zhang-and-phil-gorman">George Zhang and Phil Gorman</h3>
<p>George and Phil are both PhD students in the Berkeley neutronics group.</p>
<h3 id="mocdown">MocDown</h3>
<p><a href="http://ucb-rdn.github.io/projects/mocdown/mocdown.html">MocDown</a> is a neutron
transport, transmutation, thermal fluids, and equilibrium search tool developed
here at Berkeley primarily by Jeffrey Seifried.</p>
<p>George and Phil covered :</p>
<ul>
<li>What does MocDown do?</li>
<li>What is going on in the input files?</li>
</ul>
<p>Code examples and documentation can be found at <a href="https://jeffseif.github.io/MocDown">the homepage</a>.</p>
<h2 id="discussion-threading-with-python">Discussion: Threading with Python</h2>
<h3 id="ross-barnowski">Ross Barnowski</h3>
<p>Ross Barnowski is a PhD student in Kai Vetter’s research group. His work
focuses on nuclear instrumentation, including a 3D gamma ray imaging cart
called the <a href="https://conference.scipy.org/scipy2014/schedule/presentation/1714/">Compact Compton Imager II</a>.</p>
<h3 id="threading-in-python">Threading in Python</h3>
<p>Ross gave a talk that covered the concept of concurrency as well as how to make
it happen in Python.</p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/python_concurrency" title="Threading Code Examples">here</a>.</p>
<p>To see the ipython notebook in the notebook viewer try this link: <a href="http://nbviewer.ipython.org/github/thehackerwithin/berkeley/blob/master/python_concurrency/Concurrency%20in%20Python.ipynb">Concurrency
Notebook</a>.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h2 id="kelly--test-your-code">Kelly : Test Your Code</h2>
<p>Kelly, after having dedicated a ton of time this summer to building tests for
the WARP code, now has a test suite for it. When her colleague, the main WARP
developer, made an update to the API, her tests caught it (by failing) and she
was alerted to the global effects of the change. Moral of the story: test your
code!</p>
<h2 id="aaron-culich--brc">Aaron Culich : BRC</h2>
<p>Aside: One of the places where tests break down is in concurrency, actually! Aaron
recommends a paper <a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf">“The Problem With Threads” by Edward
Lee</a>. He also
offers us some choice quotes:</p>
<blockquote>
<p>“…non-trivial multi-threaded programs are incomprehensible to humans.”</p>
</blockquote>
<p>and</p>
<blockquote>
<p>“Threads must be relegated to the engine room of computing, to be suffered
only by expert technology providers.”</p>
</blockquote>
<p>Aaron also passed out a little handout about BRC. He encourages folks to reach out
to him (as part of the Consulting and Community initiative). One of the ways
for him to help out is here with THW, where he wants to hear our needs and
feedback.</p>
<p>They’ve already benefitted from our feedback concerning Savio
<a href="https://github.com/thehackerwithin/berkeley/tree/master/brc">here</a>. Please
feel free to add more information to that file with a pull request.</p>
<ul>
<li>
<p>In response to the need for a simpler Pledge setup documentation, they’ve
created better docs
<a href="https://github.com/ucberkeley/brc-draft-documentation/wiki/Logging-into-Savio">here</a>.</p>
</li>
<li>
<p>In response for the need for example run files, they’ve created a repository
<a href="https://github.com/ucberkeley/brc-draft-documentation">here</a>!</p>
</li>
</ul>
Numpy Vectorization and Python Logging - Andy Haefner and Dan Wooten2014-10-08T00:00:00+00:00https://BIDS.github.io/dats/posts/numpy-vectorization<h1 id="attending">Attending</h1>
<ul>
<li>Dan Wooten</li>
<li>Rachel Slaybaugh</li>
<li>Madicken Munk</li>
<li>Alejandra Jolodosky</li>
<li>Tenzing Joshi</li>
<li>John Ready</li>
<li>Joey Curtis</li>
<li>Staffan Qvist</li>
<li>Dan Wooten</li>
<li>Ross Barnowski</li>
<li>Andy Haefner</li>
<li>Laazar Zupich</li>
<li>Aaron Culich</li>
<li>Katy Huff</li>
</ul>
<h1 id="discussion-vectorization-with-numpy">Discussion: Vectorization with Numpy</h1>
<h2 id="andy-haefner">Andy Haefner</h2>
<p>Andy Haefner is a graduate student in Kai Vetter’s group.</p>
<h2 id="vectorization-with-numpy">Vectorization With Numpy</h2>
<p>A tutorial and code examples can be found
<a href="https://github.com/thehackerwithin/berkeley/tree/master/numpyVectorization">here</a>.</p>
<h1 id="discussion-the-python-logger-utility">Discussion: The Python Logger Utility</h1>
<h2 id="daniel-wooten">Daniel Wooten</h2>
<p>Daniel Wooten is a graduate student working for Max Fratoni.</p>
<h2 id="the-python-logger-utility">The Python Logger Utility</h2>
<p>Example code can be found
<a href="https://github.com/thehackerwithin/berkeley/tree/master/python_logger">here</a>.</p>
HPC Module Installation and Plotting Tools - Everyone!2014-10-01T00:00:00+00:00https://BIDS.github.io/dats/posts/brc-and-plotting<h1 id="attending">Attending</h1>
<ul>
<li>Ross Barnowski</li>
<li>Sandra Bogetic</li>
<li>Aaron Culich</li>
<li>Denia Djokic</li>
<li>Andy Haeffer</li>
<li>Jason Hou</li>
<li>Katy Huff</li>
<li>Alejandra Jolodosky</li>
<li>Madicken Munk</li>
<li>Kelly Rowland</li>
<li>Rachel Slaybaugh</li>
<li>Daniel Wooten</li>
<li>Andy Haefner</li>
<li>Ryan Pavlovsky</li>
<li>Cameron Bates</li>
<li>Ross Barnowski</li>
<li>Tenzen Joshi</li>
<li>Dav Clark</li>
</ul>
<h1 id="discussion-installing-modules-on-the-brc-savio-cluster">Discussion: Installing Modules on the BRC Savio Cluster</h1>
<h2 id="katy-huff">Katy Huff</h2>
<p><a href="http://kathuff.github.io">Katy Huff</a> is a postdoctoral scholar with the
Nuclear Science and Security Constortium and is a fellow with the Berkeley
Institute for Data Science.</p>
<h2 id="module-installation-tips-and-tricks">Module Installation Tips and Tricks</h2>
<p>I spent some time last week installing MOOSE on the cluster. The dream was
this: MOOSE should be a module that anyone can use on the cluster if they
import it. There are a couple of catches to this.</p>
<ul>
<li>MOOSE’s dependencies can each be compiled with an array of flags, should I
compile only debug versions, only non-debug versions, both?</li>
<li>MOOSE has a bunch of associated libraries which do various physics. I would
also like to install those, but they have varying permissions.</li>
</ul>
<h3 id="logging-in">Logging in</h3>
<p>Setting up easy login situaiton is a two step process:</p>
<ul>
<li>install pledge</li>
<li>create aliases for the ssh commands</li>
</ul>
<h4 id="installing-pledge-somewhere">Installing Pledge Somewhere</h4>
<p>Pledge is for generating time-sensitive one-time-use, two-factor-authentication
passwords. That’s awesome. Many of you may have seen or used the passkey
generating RSA keys that are used to log into the national laboratory networks.
How many use google two-factor authentication for their email or something
similar? I do. <a href="https://www.google.com/landing/2step/">google2factor</a>.</p>
<p>This is annoying because it takes a long time to get to the final url with
which to install Pledge. But, you will eventually succeed. Use the username and
password given to you by Krishna at LBL.</p>
<ol>
<li>Install Pledge. The easiest is likely to do this on your phone using whatever
app installation store is appropriate.</li>
<li>Go here (https://identity.lbl.gov/PledgeEnrollment/enroll.jsp), select HPCS
from the pulldown window, and enter your user name/password that Krishna from
LBNL sent to you. This should provide an 8 digit profile ID.</li>
<li>Open Pledge and click the + button. It should ask for your profile ID (the
thing you just generated); enter it, and it should download your “Pledge profile.”
If you get an error, contact Phil Goorman for trouble shooting advice.</li>
<li>Make a pin number. The pin is specific for that profile.</li>
<li>When you log into the savio cluster you will use this app to generate a new
password everytime.</li>
</ol>
<h3 id="installing-dependencies">Installing Dependencies</h3>
<p>Typically, installation requires :</p>
<ul>
<li>getting your environment right</li>
<li>downloading the source code for the dependencies</li>
<li>following the instructions for each of those</li>
</ul>
<p>MOOSE relies on two main external dependencies:</p>
<ul>
<li>HYPRE</li>
<li>PETSc</li>
</ul>
<p>It also relies on one internal dependency, libMesh. LibMesh is independent of
MOOSE, but since MOOSE has added non-standard features to libMesh, they keep
their own flavor of libMesh in the MOOSE framework source code. Clear as mud?</p>
<p>Thankfully, MOOSE is a well-documented open source project. It walks through
the installation of dependencies as well as the framework.</p>
<h4 id="environment">Environment</h4>
<p>To deal with the environment, I edited ~/.bashrc so that it now looks like:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export CLUSTER_TEMP=`mktemp -d /tmp/cluster_temp.XXXXXX`
umask 0022
export GRP_DIR="/global/home/groups/ac_nuclear"
export PACKAGES_DIR="$GRP_DIR/MOOSE/moose-compilers”
</code></pre></div></div>
<p>That makes sure that the packages will be downloaded to the right place
(CLUSTER_TEMP), installed in the right place (PACKAGES_DIR), and linked to
the right place (GRP_DIR).</p>
<p>For this to take effect, the terminal needs to re-initialize itself with :</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>source ~/.bashrc
</code></pre></div></div>
<h4 id="downloading-the-dependency-source">Downloading the Dependency Source</h4>
<p>This can be done using curl.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl -L -O --insecure https://computation.llnl.gov/casc/hypre/download/hypre-2.8.0b.tar.gz
curl -L -O http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.4.3.tar.gz
</code></pre></div></div>
<h4 id="installing-hypre">Installing Hypre</h4>
<p>First, I went to the place where I want to install it.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> cd $GRP_DIR
</code></pre></div></div>
<p>Install Hypre according to the instructions. That went well, creating the
beginning of a module called moose-dev-gcc. Sso there’s nothing interesting to
share. The interesting stuff is when things go wrong.</p>
<h4 id="installing-petsc">Installing PETSc</h4>
<p>I started to configure PETSc
Install PETSc - OOOPS - stop installing petsc and install valgrind</p>
<p>load the moose-dev-gcc module that has now been created
load valgrind</p>
<p>configure petsc</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>xxx=========================================================================xxx
Configure stage complete. Now build PETSc libraries with (legacy build):
make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3 PETSC_ARCH=arch-linux2-c-debug all
or (experimental with python):
PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3 PETSC_ARCH=arch-linux2-c-debug ./config/builder.py
xxx=========================================================================xxx
</code></pre></div></div>
<p>Now what?</p>
<p>I read the docs, and chose the legacy build because the moose docs say:</p>
<p>During the configure/build process, you will be prompted to enter the correct make commands. Because this can be different from system to system, I leave that task to the reader. However, I have received better results when following the non-experimental commands.
make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3 PETSC_ARCH=arch-linux2-c-debug all</p>
<p>It worked !</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Completed building libraries
=========================================
making shared libraries in /global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3/arch-linux2-c-debug/lib
building libpetsc.so
=========================================
Now to install the libraries do:
make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3 PETSC_ARCH=arch-linux2-c-debug install
=========================================
</code></pre></div></div>
<p>So, I did that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[huff@ln001 petsc-3.4.3]$ make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3 PETSC_ARCH=arch-linux2-c-debug install
*** Using PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/stack_src/petsc-3.4.3 PETSC_ARCH=arch-linux2-c-debug ***
*** Installing PETSc at prefix location: /global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt ***
====================================
Install complete. It is useable with PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt [and no more PETSC_ARCH].
Now to check if the libraries are working do (in current directory):
make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt test
====================================
[huff@ln001 petsc-3.4.3]$
</code></pre></div></div>
<p>So, I ran the tests:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt test
</code></pre></div></div>
<p>Here’s the output:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[huff@ln001 petsc-3.4.3]$ make PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt test
Running test examples to verify correct installation
Using PETSC_DIR=/global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt and PETSC_ARCH=arch-linux2-c-debug
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: ln001.brc
--------------------------------------------------------------------------
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes
See http://www.mcs.anl.gov/petsc/documentation/faq.html
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: ln001.brc
--------------------------------------------------------------------------
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
[ln001.brc:54921] 1 more process has sent help message help-mpi-btl-openib.txt / no active ports found
[ln001.brc:54921] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
egrep: /global/home/groups/ac_nuclear/MOOSE/moose-compilers/petsc/petsc-3.4.3/gcc-opt/arch-linux2-c-debug/include/petscconf.h: No such file or directory
Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: ln001.brc
--------------------------------------------------------------------------
Number of SNES iterations = 4
Completed test examples
</code></pre></div></div>
<p>On first glance, maybe it passed, right? WRONG! It failed, yo. <em>The first rule
of programming is: Google the error.</em> Googling it, of course, this sends us to
a discussion on the petsc-users list host - exactly what we want - <a href="http://lists.mcs.anl.gov/pipermail/petsc-users/2014-July/022325.html">right
here</a>
Interestingly, the question comes from someone at LBL. Perhaps it’s even on the
same BRC system? In any case, Barry Smith, who leads the PETSc project, responded…</p>
<blockquote>
<p>Well it is running. It is just producing annoying warning messages. You need
to talk to your local MPI expert on that system for how to get rid of the
problem.
Barry</p>
</blockquote>
<p>Since I don’t have any idea who in BRC is the MPI guru willing to solve it, I
guess we just make a note of it and go on with our lives. So, moving on, I had
to clone moose.</p>
<ul>
<li>I hate using github’s ssh protocol, so I set up my ssh keys for
the brc cluster <a href="https://help.github.com/articles/generating-ssh-keys">this is how that works</a>.</li>
<li>I have a fork of moose (which currently exactly parallels
moose development), so I cloned from that.</li>
<li>I also fetched the upstream idaholab/moose repo so that I can keep up to date.</li>
<li>MOOSE likes a clean history, so every time you pull, you have to rebase
(we all make choices…) <code class="highlighter-rouge">git pull --rebase upstream master</code></li>
</ul>
<h1 id="lightning-talks">Lightning Talks</h1>
<h2 id="ross-barnowski--pyqt">Ross Barnowski : PyQT</h2>
<p>So, pyqtgraph is good for volumetric rendering. There are a lot of example scripts,
so you can copy those. Additionally it is good for making fast video graphics
(better than matplotlib).</p>
<h2 id="alejandra--matlab-ternary-plots">Alejandra : MATLAB ternary plots</h2>
<p>Alejandra shared a ternary plotting thing.</p>
<h2 id="andrew-hefner--mayavi">Andrew Hefner : Mayavi</h2>
<p>[Mayavi uses vtk, which is pretty powerful, but it’s a python interface.
Additionally, it uses syntax that will be familiar to the matlab users.</p>
<p>There are various interesting features in Mayavi. Quiver, for example, is a
really basic function call that generates vector fields.
http://docs.enthought.com/mayavi/mayavi/</p>
<h2 id="ryan-pavlovsky--dygraph">Ryan Pavlovsky : DyGraph</h2>
<p><a href="http://dygraphs.com/">Dygraphs</a> is a nice, lightweight, and interactive. So,
it’s great for websites, because you just drop a single javascript file.</p>
<h2 id="katy-huff--yt-and-what-is-plotly">Katy Huff : yt (and what is plotly?)</h2>
<p>Katy likes and is impressed with yt. She is curious but nervous about
<a href="http://plot.ly">plotly</a>.</p>
<h2 id="dav-clark--bokeh">Dav Clark : Bokeh</h2>
<p>It’s architected to have a javascript frontend and is meant to be hooked into
generic data servers.</p>
<p>It has cool zooming capabilities in the gui and has neato feature like linked
brushing so that two plots are linked and can be interacted with using a single
tool in one of the windows.</p>
PARCS and RadWatch (without the physics) - Sandra Bogetic and Ryan Pavlovsky2014-09-24T00:00:00+00:00https://BIDS.github.io/dats/posts/parcs-and-radwatch<h1 id="attending">Attending</h1>
<ul>
<li>Sandra Bogetic</li>
<li>Christian DiSanzo</li>
<li>Alejandra Jolodosky</li>
<li>James</li>
<li>Kelly Rowland</li>
<li>Jasmina Vujic</li>
<li>Rachel Slaybaugh</li>
<li>Massimiliano Fratoni</li>
<li>Katy Huff</li>
<li>Dan Wooten</li>
<li>Aaron Culich</li>
<li>Madicken Munk</li>
<li>Kedar Kolluri</li>
<li>James Bevins</li>
</ul>
<h1 id="discussion-parcs">Discussion: PARCS</h1>
<h2 id="sandra-bogetic">Sandra Bogetic</h2>
<p>Sandra Bogetic is a first year graduate student in the Nuclear Engineering
Department.</p>
<h2 id="parcs">PARCS</h2>
<p>PARCS is a powerful tool, but it seems to have struggles with version control,
and would strongly benefit from a more transparent and controlled release
procedure.</p>
<p>Since it is an NQA-1 code…. its surprising that it is not under version
control.</p>
<h3 id="input-files">Input Files</h3>
<ul>
<li>Generation of cross sections can be done in various ways. These include
CASMO, HELIOS, and TRITON</li>
<li>The input for thermal hydraulic behavior can be enered into PARCS or coupled
using PATH, TRACE, and/or RELAP. For a PWR, you can do it in TH.</li>
<li>Depletion can be done by PARCS, but sometimes you don’t want to do it with
PARCS because perhaps you have done depletion in some other code (such as
SIMULATE). PARCS allows you to input this external data.</li>
<li>The input file formatting is in blocks.
<ul>
<li>CNTL, XSEC, GEOM, PARAM, TH, TRAN, etc.</li>
<li>ata can be repeated using an asterisk</li>
<li>Input ends with a .</li>
<li>etc.</li>
</ul>
</li>
</ul>
<h3 id="options">Options</h3>
<p>There are many options that can or should be specified. The core type, core
power, simulation behavior concerning Xe and Sm (will you input the values, do
you want them to be at equilibrium, transient, etc), control rod banking
positions, external thermal hydraulics linkages, print options, whether or not
to conduct depletion, etc.</p>
<p>Additionally, there is a tree variable for cross section definitions.</p>
<p>The geometry card of course is very important. The core compositions are all
defined for the assemblies, reflector, etc. Typical boundary conditions are
available.</p>
<h3 id="running-the-input">Running the input</h3>
<h3 id="examples">Examples</h3>
<p>Examples can be found in the presentation, but will not be shared online.</p>
<h1 id="discussion-radwatch-without-all-the-physics">Discussion: RadWatch (without all the physics)</h1>
<h2 id="ryan-pavlovsky">Ryan Pavlovsky</h2>
<p>Ryan is a graduate student in the Nuclear Engineering Department.</p>
<h2 id="linux-and-unix-tools-within-radwatch">Linux and Unix tools within RadWatch</h2>
<h3 id="the-stack">The stack</h3>
<p>Sensor input, python, smpt, datetime, python, cron, scp, ssh, ssh-agent, pytables,
matplotlib, scp, yes, drupal, jquery</p>
<h3 id="cron">CROn</h3>
<p>CROn is for scheduling jobs.</p>
<p>Crontab -e can be used to edit the cron file for your user space. Don’t freak
out if it’s empty. Just use a template from your toplevel cron file or find a
template on the internet to fill out.</p>
<p>Ryan reminds us of the importance of the man page. If you need more help with
the crontab command, try man crontab in your terminal to figure out its
secrets. (Man pages are opened in a program called less. So, to get out of the
man page, type “q”.)</p>
<p>Note that your system may have a cron.allow file. That file, if it exists,
names the people allowed to create cron jobs.</p>
<h3 id="ssh">SSH</h3>
<p>Ryan points out that there are two versions of ssh (client and server). They
have their own configuration files!</p>
<p>Note the config file located in <code class="highlighter-rouge">/etc/ssh/ssh_config</code>, but also, note the one
in your home directory <code class="highlighter-rouge">~/.ssh/config</code> and the one for server daemon
configuration <code class="highlighter-rouge">/etc/ssh/sshd_config</code>. NOTE: on MACOSX, there may not be an
additional ssh directory layer in etc. So, find those files at
<code class="highlighter-rouge">/etc/ssh_config</code> and <code class="highlighter-rouge">/etc/sshd_config</code>.</p>
<p><strong>Fun Fact</strong> DSA has a stronger random number generator than RSA, but RSA is
used more widely. This is likely because RSA encription is faster and (more
compressed?) than DSA.</p>
<p>Code examples from Ryan’s talk can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/cron_ssh">here in the master
branch</a>.</p>
When and Where Survey2014-09-18T00:00:00+00:00https://BIDS.github.io/dats/posts/survey<h1 id="space-and-time">Space and Time</h1>
<p>Space and time are complex, coupled problems for THW. We would like a place and
a time that works well for everyone. But, schedules differ, geography is an
issue, and space is hard to come by on this campus.</p>
<h2 id="space">Space</h2>
<p>There are three possible spaces.</p>
<h3 id="2150-shattuck-suite-230">2150 Shattuck, Suite 230</h3>
<p>This is the space we’ve been using for the last year. Many of you know it. It
has lovely light, a lot of chairs, and a nice big round table. Additionally, we
never get kicked out of it for other events, because our events take priority
here. The location may be a downside for those of you who drive or don’t like
walking from Etcheverry.</p>
<h3 id="190-doe">190 Doe</h3>
<p><a href="http://vcresearch.berkeley.edu/sites/default/files/bids_collage.jpg" title="bids space">This space</a> is brand spanking new, and yet, very old. The Berkeley Institute for
Data Science (driven by many of the same concerns that drive The Hacker Within)
has an open, computational-science-focused space that is just being finished
up. It’s in the historic Doe library, right at the entrance, so it’s very
central and convenient for people coming from any corner of campus. Also, it
looks like a beautiful startup space, it has a camera/screen portal with
the capability to broadcast our meeting to remote viewers, and Katy has the
keys. The construction crew is finishing up some of the A/C in the room this
week, but all the furniture is in and it’s ready for excellent events like
this. Go check it out if you don’t believe me.</p>
<h3 id="4101-etcheverry">4101 Etcheverry</h3>
<p>This conference room is lovely and keeps its guests surrounded by nuclear
engineering books. It’s a popular choice for many NE meetings and has a lot of
charm. It is large enough to seat our current meeting attendees, but probably
no more than that. So, consider the ideal size of THW, when you make this
decision. Many might prefer that THW stay the same size it is now. It is a very
convenient location for those of you in Etcheverry, but fairly inconvenient for
those of us who sit in the NSSC space.</p>
<h2 id="time">Time</h2>
<p>In general, we’d like to have the meeting in the afternoon. Three possible
start times have been suggested. These are 3:00, 3:30, and 4:00pm. The meeting
nominally lasts between 1.5 and 2 hours.</p>
<p>Consider your class schedule. Please be generous. It’s ok if you have to be 10
minutes late. This is Berkeley.</p>
<h2 id="exercise-your-rights">Exercise your rights</h2>
<p>We’re all equal here. Please exercise your rights by voting in this <a href="https://docs.google.com/forms/d/1RzUbbpUpNu7jDy166WaAXxwooGmapW15tcDChSNA5W0/viewform?usp=send_form" title="Form">online
poll</a> that I’ve set up. Between this poll and the availability of key
locations, we’ll find a place and time.</p>
Serpent and LaTeX - Alejandra Jolodosky and Katy Huff2014-09-17T00:00:00+00:00https://BIDS.github.io/dats/posts/serpent-and-latex<h1 id="attending">Attending</h1>
<ul>
<li>Kelly Rowland</li>
<li>Madicken Munk</li>
<li>Sandra Bogetic</li>
<li>Daniel Wooten</li>
<li>Alejandra Jolodosky</li>
<li>Jasmina Vujic</li>
<li>Massimiliano Fratoni</li>
<li>Aaron Culich</li>
<li>Ross Barnowski</li>
<li>Ryan Pavlovsky</li>
<li>Kedar Kolluri</li>
<li>Xin Wang</li>
<li>Katy Huff</li>
<li>James Bevins</li>
<li>Jessica Roche</li>
</ul>
<h1 id="discussion-serpent">Discussion: Serpent</h1>
<h2 id="alejandra-jolodosky">Alejandra Jolodosky</h2>
<p>Alejandra is a graduate student in the nuclear engineering department. She’ll
discuss the use of Serpent and how to find a bug when using it.</p>
<h2 id="serpent">Serpent</h2>
<p>Download the slides
<a href="https://github.com/thehackerwithin/berkeley/raw/master/serpent/serpent_tut.pdf">here.</a></p>
<h1 id="discussion-latex-markup-for-science">Discussion: LaTeX, markup for science</h1>
<h2 id="katy-huff">Katy Huff</h2>
<p>Katy Huff is a postdoctoral scholar in NSSC and BIDS.</p>
<h2 id="latex">LaTeX</h2>
<p>Notes and code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/LaTeX" title="Code Examples">here</a>.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h2 id="kelly-rowland--today-i-learned-what-a-call-stack-is">Kelly Rowland : Today I learned what a call stack is.</h2>
<p>Everyone, check out the wikipedia article for Call Stack</p>
<h2 id="ross-barnowski--did-you-know-latex--matplotlib--awesome">Ross Barnowski : Did you know LaTeX + Matplotlib = Awesome</h2>
<p>With dollar signs in plain text, matplotlib renders math on your plot, in the
title, on the axes, in the labels… wherever!</p>
CRAM and imagemagick - Dan Wooten and Madicken Munk2014-09-10T00:00:00+00:00https://BIDS.github.io/dats/posts/CRAM-and-imagemagick<h1 id="attending">Attending</h1>
<ul>
<li>Daniel Wooten</li>
<li>Kelly Rowland</li>
<li>Cameron Bates</li>
<li>Christian DiSanzo</li>
<li>George Zhang</li>
<li>Phil Gorman</li>
<li>Alejandra Jolodosky</li>
<li>Madicken Munk</li>
<li>Jasmina Vujic</li>
</ul>
<h1 id="discussion-cram">Discussion: CRAM</h1>
<h2 id="daniel-wooten">Daniel Wooten</h2>
<p>Dan Wooten is a second year graduate student in the Nuclear Engineering
Deparment.</p>
<h2 id="the-cram-method">The CRAM method</h2>
<p>Dan introduced the CRAM method.</p>
<h1 id="discussion-imagemagick">Discussion: Imagemagick</h1>
<h2 id="madicken-munk">Madicken Munk</h2>
<p>Madicken Munk is a fourth year graduate student in the Nuclear Engineering
Deparment.</p>
<h2 id="imagemagick">Imagemagick</h2>
<p>Madicken demonstrated the generation of gifs on the command line.</p>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/imagemagick" title="Code Examples">here</a>.</p>
<h1 id="discussion-of-next-week">Discussion of Next Week</h1>
<p>Proposed future talk(s) from Jasmina: Series of “how to” talks— how to find and
install your software, how to set up your environments, etc.</p>
<p>It was decided that next week’s Nuclear Talk should be Serpent Tutorial and How
to Approach a Bug: Alejandra Jolodosky</p>
<p>The non-nuclear talk should be Introduction to LaTex (how you should interface
with Tex on your respective OS). How to format your paper, image additions, and
managing citations. : Katy can do this.</p>
Computational Nuclear Engineering Overview & Bash - Max Fratoni & Katy Huff2014-09-03T00:00:00+00:00https://BIDS.github.io/dats/posts/bash<h1 id="attending">Attending</h1>
<ul>
<li>Kelly Rowland</li>
<li>Katy Huff</li>
<li>Alejandra Jolodosky</li>
<li>Blake Huff</li>
<li>Jasmina Vujic</li>
<li>Sven</li>
<li>Daniel Wooten</li>
<li>Rachel Slaybaugh</li>
<li>Denia Djokic</li>
<li>Madicken Munk</li>
<li>Sandra Bogetic</li>
<li>Phil Gorman</li>
<li>Naman</li>
<li>Max Fratoni</li>
</ul>
<h1 id="computational-tools-for-nuclear-engineering-an-overview">Computational Tools for Nuclear Engineering, An Overview</h1>
<h2 id="speaker-intro-massimiliano-fratoni">Speaker Intro: Massimiliano Fratoni</h2>
<p><a href="http://www.nuc.berkeley.edu/people/massimiliano_fratoni" title="Max Fratoni">Max Fratoni</a> is a professor (forever freshman) in the nuclear engineering
department who specializes in computational neutronics methods, advanced
reactors, and accident tolerant fuels.</p>
<h2 id="discussion-computational-tools-for-nuclear-engineering">Discussion: Computational Tools for Nuclear Engineering</h2>
<p>Max would like to help define what to use when. The first quest to ask is “What
is your problem like?” As whether it is:</p>
<ul>
<li>a steady state or time dependent</li>
<li>over a short (reactivity excursion) or long (depletion) time frame</li>
</ul>
<p>The most generic types of tools are either</p>
<ul>
<li>stochastic (Monte Carlo)</li>
<li>or deterministic (many).</li>
</ul>
<p>The question, again, is “What are you trying to model?” If your simulation has
a common geometry and common materials, then deterministic tools are certainly
likely to be the answer. For deterministic codes, there are many
simplifications, so it’s likely to be fast, but perhaps not as flexible.</p>
<p>If your geometry or you have unusual materials, stochastic models are probably
going to capture your problem the best. In general, you will choose either MCNP
or Serpent. So, when do you use MCNP and when do you use Serpent? While Serpent
is very user friendly, the theory part in the Serpent manual, it is very hard
to be confident in your results, since there are so many knobs that can be
turned, but don’t actually have to be turned.</p>
<p>Serpent, for example, can combine points and make up its own energy grid. When
you do this, you can lose accuracy, in particular in the unresolved resonances.
This unified energy grid (which is set by default) will definitely bias some of
your isotopics.</p>
<p>That’s fine, but MCNP doesn’t do depletion in a reliable way.</p>
<p>There are also a suite of codes that are capable of transient solutions by
coupling with a monte carlo or deterministic code. These
are often specifically designed for a certain reactor. This includes PARCS, for
example.</p>
<p>Besides coupling with a monte carlo or deterministic code, depletion can be
handled, by and large, by ORIGEN. ORIGEN2 and ORIGEN-S are your options. The
resuls from ORIGEN are going to be just as good as your cross sections.</p>
<h3 id="future-topics">Future Topics</h3>
<ul>
<li>What is the difference between the exponential matrix method and the kram
method? (Daniel)</li>
<li>Mocdown (Phil)</li>
<li>PARCS (Sandra)</li>
<li>Serpent&PARCS (Sandra)</li>
<li>COMSOL (Madicken)</li>
<li>MONTEBURNS (Alejandra)</li>
<li>MOOSE (Katy)</li>
</ul>
<p>Madicken will show off COMSOL next week, and then Daniel will talk the week
after that.</p>
<h1 id="discussion-bash-and-unix--linux-environments">Discussion: Bash and Unix / Linux Environments</h1>
<h2 id="speaker-intro-katy-huff">Speaker Intro: Katy Huff</h2>
<p><a href="http://katyhuff.github.io" title="Katy Huff">Katy Huff</a> is an NSSC Postdoctoral Scholar and a Berkeley Institute for Data Science
Fellow.</p>
<h2 id="discussion-bash">Discussion: Bash</h2>
<p>Code examples can be found <a href="https://github.com/thehackerwithin/berkeley/blob/master/bash/tutorial.md" title="Tutorial Source">here</a>.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h2 id="kelly-rowland">Kelly Rowland</h2>
Fall Kickoff 20142014-08-27T00:00:00+00:00https://BIDS.github.io/dats/posts/fall-kickoff<p>Wednesday at 4pm in 2150 Shattuck, Suite 230.</p>
<h1 id="attending">Attending</h1>
<ul>
<li>Kelly Rowland</li>
<li>Madicken Munk</li>
<li>Ross Barnowski</li>
<li>Daniel Wooten</li>
<li>Russell</li>
<li>Sven</li>
<li>Massimiliano Fratoni</li>
<li>Denia Djokic</li>
<li>Katy Huff</li>
</ul>
<h1 id="discussion-upcoming-topics">Discussion: Upcoming Topics</h1>
<p>The Berkeley chapter of the Hacker Within scientific computing group (formerly
known as the Berkeley NE computational methods group) will be kicking off the
fall 2014 semester on Wednesday, August 27th, from 4pm-6pm.</p>
<p>The goal of this meeting was to plan the rest of the semester’s meetings.
The time, frequency, and content of the upcoming semester’s meetings
we all up for discussion. In particular, we were able to brainstorm
a possible suite of software tools, resources, and practices to discuss this
upcoming semester. If you have an idea, but didn’t make it to the meeting, reply
on the [UCB hackerwithin listhost][listhost].</p>
<h2 id="brainstorming-computational-topics">Brainstorming Computational Topics</h2>
<p>We thought of a number of cool things we’d like to talk about this semester.</p>
<ul>
<li>LaTeX
* Resumes</li>
<li>PyNE
* Live build (Kelly)</li>
<li>Bash</li>
<li>Ubuntu install and dual boot</li>
<li>Fun hacky things
* Doxygen
* Cmake</li>
<li>Plotting
* Matplotlib, yt
* gnuplot
* 3D options
* Rules * how to make a good plot
* animations (imagemagick)</li>
<li>Presentation rules
* tools
* rules
* Pre-ANS</li>
<li>Vectorized/Matrix computing * formulating your problem correctly
* Andy’s diffusion example</li>
<li>Web tools (scraping, python urllib, wget)</li>
<li>Extending Python
* cython, C/API, boost.python</li>
<li>Threading (multiprocessing, ZMQ)</li>
</ul>
<h2 id="bootcamp-series-ideas">Bootcamp Series Ideas</h2>
<p>Professor Fratoni has an excellent idea for embedding a seminar for neutronics
specific toolsets into this general computational seminar. The topics will
vary and will be the subject of discussion during the September 3rd meeting.</p>
<ul>
<li>Tools
- Serpent
- MCNP
- ORIGEN
- MOCUP
- MOOSE</li>
<li>Data/Methods</li>
<li>Tricks ‘n tips</li>
<li>Overviews & Comparison</li>
<li>Increase interactivity, project/tutorial focused</li>
<li>Group attendees by interest/skill-level</li>
</ul>
<h2 id="general-thoughts">General thoughts</h2>
<ul>
<li>Tutorial code should be posted on github prior to presentation</li>
<li>Reminders should be sent the day before the meeting</li>
<li>Also, the listhost reminders should go out to ne-grads for a while</li>
<li>Having the meetings in the Doe Library BIDS space seems feasible.</li>
</ul>
<h2 id="meeting-structure-ideas">Meeting Structure ideas</h2>
<p>1st hour -> nuclear tools seminar
1:00 - 1:45 -> computing skillz
1:45 - people burn out -> lightning/hanging out</p>
<h2 id="upcoming-talks">Upcoming Talks</h2>
<ol>
<li>Bash (Katy)</li>
<li>Latex/resumes (Katy-resumes), (Laurence, Rachel for general Latex?)</li>
<li>PyNE (Kelly)</li>
</ol>
<h1 id="ordering-of-talks---nuclear-series">Ordering of talks - Nuclear series</h1>
<ol>
<li>Max overview of what tools to use when</li>
<li>… figure out from Max’s talk</li>
</ol>
LaTeX - Laurence Lewis2014-04-29T00:00:00+00:00https://BIDS.github.io/dats/posts/LaTeX<h1 id="attending">Attending</h1>
<ul>
<li>Katy Huff</li>
<li>Ryan Bergmann</li>
<li>Professor Rachel Slaybaugh</li>
<li>Professor Max Fratoni</li>
<li>Kelly Rowland</li>
<li>Daniel Wooten</li>
<li>Joshua Howland</li>
<li>Madicken Munk</li>
<li>and others… I failed at taking attendance this time.</li>
</ul>
<h1 id="lesson-latex">Lesson: LaTeX</h1>
<p>You can find a lot of Laurence’s <a href="https://github.com/thehackerwithin/berkeley/blob/master/LaTeX" title="Examples">examples</a> in the master branch of our
repository.</p>
<h1 id="lightning-talk-rachel-on-drawing-katy-on-floatbarrier-and-max-on-easy-latex">Lightning Talk: Rachel on drawing, Katy on FloatBarrier and Max on Easy LaTeX</h1>
<p>Rachel shared her LaTeX homework assignments, Katy pointed out FloatBarrier,
the best command ever, and Max showed off a WYSIWYG latex editor called LyX.</p>
So You Have A Software2014-04-23T00:00:00+00:00https://BIDS.github.io/dats/posts/so-you-have-a-software<h1 id="attending">Attending</h1>
<ul>
<li>Katy Huff</li>
<li>Ryan Bergmann</li>
<li>Professor Rachel Slaybaugh</li>
<li>Joshua Howland</li>
<li>Anthony Scopatz</li>
<li>and many others… I failed at taking attendance this time.</li>
</ul>
<h1 id="talk">Talk</h1>
<p>This week Anthony Scopatz gave a talk on software architecture
patterns. Hint: the most important file in your project is the
license! <a href="https://docs.google.com/presentation/d/1QaGPtOq3MNg62l9e5lfVKs8xVBgMIwMK8BtFquvE58w/edit?usp=sharing">Find the slides here.</a></p>
Packaging and Distribution - Anthony Scopatz2014-04-22T00:00:00+00:00https://BIDS.github.io/dats/posts/packaging<h1 id="lesson-anthony-scopatz-so-you-have-a-software">Lesson: Anthony Scopatz “So You Have a Software”</h1>
<p>Anthony’s talk can be found <a href="https://docs.google.com/presentation/d/1QaGPtOq3MNg62l9e5lfVKs8xVBgMIwMK8BtFquvE58w/edit#slide=id.p">here</a></p>
Emailing with Python - Ross Barnowski2014-04-16T00:00:00+00:00https://BIDS.github.io/dats/posts/automating-emails<h1 id="attending">Attending</h1>
<ul>
<li>Ross Barnowski</li>
<li>Katy Huff</li>
<li>Ryan Bergmann</li>
<li>Professor Rachel Slaybaugh</li>
<li>Kelly Rowland</li>
<li>Daniel Wooten</li>
<li>Joshua Howland</li>
<li>Madicken Munk</li>
<li>…</li>
</ul>
<h1 id="lesson-emailing-with-python">Lesson: Emailing With Python</h1>
<p>Tutorial for sending email using python, smtplib, and the gmail smtp server.</p>
<p>Requires:</p>
<ul>
<li>python (2.6 or greater)</li>
<li>Python modules: smtplib, email, getpass, psutil (advanced example)</li>
</ul>
<p>Example scripts (<a href="https://github.com/thehackerwithin/berkeley/blob/master/python_email" title="Tutorial Source">examples</a>):</p>
<ul>
<li>
<p><code class="highlighter-rouge">smtp_simple.py</code>: Simplest example demonstrating the use of smtplib to send a
“Hello World” style messge</p>
</li>
<li><code class="highlighter-rouge">smtp_mime.py</code>: A more complicated example demonstrating the use of several MIME
objects in the email module to construct a message out of
formatted text (html) with an image attachment.</li>
<li><code class="highlighter-rouge">simulation_example_</code> : This folder contains an example python script that
calls a simulation program (in this case, a plasma calculation from Prof.
Morse’s 281 class). The simulation is launched from the python script, and
psutil is used to do some rudimentary performance logging. When the calculation
finishes, the results, simulation output, and performance statistics are all
attached to an email and sent to the user.</li>
</ul>
<p>NOTE: The logging in this example is for demonstration only. This simple
logging is probably not the way you’d want to do it if you truly wanted to
track the performance of a running calculation. May not work on all systems.</p>
<h1 id="lightning-talk-rachel-on-pretty-images-and-madicken-on-slow-mcnp">Lightning Talk: Rachel on pretty images and Madicken on slow MCNP</h1>
<p>Rachel showed an excellent-looking, peacock colored image of the ratio of two
neutronics solutions.</p>
<p>Madicken discussed the behavior of MCNP when a single material is replaced by
a material which causes more neutron scattering. Result: MCNP slows down a whole
heck of a lot for such materials.</p>
Raspberry Pi Hacking - Ryan Pavlovsky2014-04-09T00:00:00+00:00https://BIDS.github.io/dats/posts/raspberry-pi<h1 id="attending">Attending</h1>
<ul>
<li>Ryan Pavlovsky</li>
<li>Katy Huff</li>
<li>Ryan Bergmann</li>
<li>Josh Howland</li>
<li>Prof. Rachel Slaybaugh</li>
<li>Kelly Rowland</li>
<li>Ross Barnowski</li>
<li>Tomi Akindele</li>
</ul>
<h1 id="lesson-raspberry-pi">Lesson: Raspberry Pi</h1>
<p>Ryan Pavlovsky, a student in Kai Vetter’s research group, gave an excellent
presentation about what he’s done with the raspberry pi.</p>
<p>Stuff that we discussed :</p>
<ul>
<li>How did you get this?</li>
<li>
<p>What are the peripherals that work with it?</p>
<ul>
<li>gpu/cpu</li>
<li>broadcomm video card</li>
<li>ARM processor, 700 MHz</li>
<li>512 MB memory</li>
<li>JTag header?</li>
<li>USB/Ethernet</li>
<li>SD card additional memory</li>
<li>Raspbian operating system</li>
</ul>
</li>
<li>
<p>What example projects are cool?</p>
<ul>
<li>smart kegerator (monitors flow rates, temperatures, accounting, facial
detection)</li>
<li>Quake III</li>
<li>cluster of pis. built mpi on it. rack made of legos!</li>
</ul>
</li>
<li>
<p>Demos!</p>
<ul>
<li>pong, a ping sensor. Sends a ping, measures time to return.</li>
<li>ping, a program that acquires pong senses over time.</li>
<li>simon says, computer tells you what to do, based on ping</li>
<li>
<p>GEANT4</p>
<ul>
<li>4.10 C++ implementation</li>
<li>networked raspberry pi</li>
<li>edited ~/.bashrc for data</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Code examples for the demo can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/raspi" title="Code Examples">here</a>.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<p>We talked, in an ad hoc fashion about the hearbleed OpenSSL bug.</p>
Testing Part II - Katy Huff2014-04-02T00:00:00+00:00https://BIDS.github.io/dats/posts/testingII<h1 id="lesson-introduction-to-testing">Lesson: Introduction to Testing</h1>
<p>Katy gave a very quick continuation of testing in the context of languages other
than python. She mostly did a tour through the Cyclus code and its tests,
written using the google test framework and built into an executable with CMake.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<p>Ross Barnowski gave a quick overview of how to mount remote drives on a linux or
unix platform.</p>
Testing - Katy Huff2014-03-19T00:00:00+00:00https://BIDS.github.io/dats/posts/testing<h1 id="lesson-introduction-to-testing">Lesson: Introduction to Testing</h1>
<p>Katy gave a very quick intro to testing using the python nosetests package. There is a simple example <a href="https://github.com/thehackerwithin/berkeley/tree/master/testing" title="Testing Example">here</a>.</p>
IPython - Ross Barnowski2014-03-12T00:00:00+00:00https://BIDS.github.io/dats/posts/IPython<h1 id="lesson-introduction-to-ipython">Lesson: Introduction to IPython</h1>
<p>Ross gave an introduction to one of the best tools in the python development
suite: IPython.</p>
<p>His notes for this tutorial can be found <a href="https://github.com/thehackerwithin/berkeley/tree/master/IPython" title="here">on github</a>.</p>
Makefiles - Katy Huff2014-03-05T00:00:00+00:00https://BIDS.github.io/dats/posts/makefiles<h1 id="lesson-introduction-to-makefiles">Lesson: Introduction to Makefiles</h1>
<p>Katy gave a very quick intro to makefiles. This was based largely on Software Carpentry material, replicated <a href="https://github.com/thehackerwithin/berkeley/tree/master/make" title="make">here</a>.</p>
Self Documenting Code - Rachel Slaybaugh2014-02-26T00:00:00+00:00https://BIDS.github.io/dats/posts/documentation<h1 id="attending">Attending</h1>
<ul>
<li>Prof. Rachel Slaybaugh</li>
<li>Ryan Bergmann</li>
<li>Jankai (Jack) Yu</li>
<li>Dan Wooten</li>
<li>Sandra Bogetic</li>
<li>Christian DiSanzo</li>
<li>Josh Howland</li>
<li>Alex Chong</li>
<li>Kelly Rowland</li>
<li>Phil Gorman</li>
<li>Jason Hou</li>
</ul>
<h1 id="lesson-code-documentation">Lesson: Code Documentation</h1>
<p>Rachel gave a brief overview of a variety of documentation strategies, including how to write code comments that generate a useful API. Here is [Rachel’s Tutorial][rachelstalk].</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<h1 id="rachelstalk-httpsgithubcomthehackerwithinberkeleytreemasterdocumentationdocumentationmd-rachels-tutorial">[rachelstalk]: https://github.com/thehackerwithin/berkeley/tree/master/documentation/documentation.md “Rachel’s Tutorial”</h1>
<p>title: Documentation - Rachel Slaybaugh
comments: true
category: posts
tags: meeting documentation
—</p>
<h1 id="lesson-introduction-to-documentation">Lesson: Introduction to Documentation</h1>
<p>Professor Rachel Slaybauh gave an introduction to documenting code. This covered:</p>
<ul>
<li>Code Comments</li>
<li>API Documentation</li>
<li>Auto-Documentation</li>
<li>Self-Documenting Code</li>
<li>Readmes</li>
<li>User Guides</li>
<li>Developer Guides</li>
</ul>
<p>You can find details about this topic from the meeting <a href="https://github.com/thehackerwithin/berkeley/blob/master/documentation/documentation.md" title="here">notes</a>.</p>
Intro to Git Part II - Katy Huff2014-02-19T00:00:00+00:00https://BIDS.github.io/dats/posts/git-part-2-meeting<h1 id="lesson-introduction-to-git-part-ii">Lesson: Introduction to Git Part II</h1>
<p>Katy gave the second half to version control using git: remotes. Here is <a href="https://github.com/thehackerwithin/berkeley/tree/master/git/partII" title="Katy's Tutorial">Katy’s Tutorial</a>.</p>
Intro to Git - Katy Huff2014-02-12T00:00:00+00:00https://BIDS.github.io/dats/posts/git-intro-meeting<h1 id="attending">Attending</h1>
<ul>
<li>Katy Huff</li>
<li>Ryan Bergmann</li>
<li>Jankai (Jack) YU</li>
<li>Dan Wooten</li>
<li>Sandra Bogetic</li>
<li>Christian DiSanzo</li>
<li>Madicken Munk</li>
<li>Josh Howland</li>
<li>Prof. Rachel Slaybaugh</li>
<li>Kelly Rowland</li>
<li>Phil Gorman</li>
<li>Alexjandra Jolodosky</li>
<li>Kelly Rowland</li>
<li>Jason Hou</li>
</ul>
<h1 id="lesson-introduction-to-git">Lesson: Introduction to Git</h1>
<p>Katy gave a very quick intro to version control using git. Here is <a href="https://github.com/thehackerwithin/berkeley/tree/master/git/partI" title="Katy's Tutorial">Katy’s Tutorial</a>.</p>
<h1 id="lightning-talks">Lightning Talks</h1>
<p>Rachel gave an introduction to the IPython Notebook, an excellent tool for
prototyping python code.</p>
GPUs and CUDA - Ryan Bergmann2014-02-05T00:00:00+00:00https://BIDS.github.io/dats/posts/ryan-on-gpus<h1 id="attending">Attending</h1>
<ul>
<li>Ryan Bergmann</li>
<li>Katy Huff</li>
<li>Jankai (Jack) YU</li>
<li>Dan Wooten</li>
<li>Prof. Max Fratoni</li>
<li>Sandra Bogetic</li>
<li>Christian DiSanzo</li>
<li>Josh Howland</li>
<li>Prof. Rachel Slaybaugh</li>
<li>Kelly Rowland</li>
<li>Nikola Radnovic</li>
</ul>
<h1 id="lesson-gpus-and-cuda">Lesson: GPUs and CUDA</h1>
<p>Ryan Bergmann covered various features of GPUs and CUDA. Here is <a href="https://github.com/sellitforcache/cuda_tut" title="Ryan's Tutorial">Ryan’s Tutorial</a>.</p>
<p>Things we learned include:</p>
<ul>
<li><strong>CUDA</strong> stands for Compute Unified Device Architecture.</li>
<li><strong>SIMD</strong> stands for Single Instruction Multiple Data.</li>
<li>GPUs are good for turning compute-bound problems into memory-bound ones.</li>
<li><strong>CUDA cores aren’t really cores</strong> there are multiple cores per CUDA core.</li>
<li>You have to use the SIMD lanes in order to get good performance out of a GPU system.</li>
<li><strong>Coalesced reading and writing</strong> means that your cores should be accessing
adjacent pieces of memory simultaneously.</li>
<li>The memory latency is higher for GPUs than CPUs, but the GPU hides this better
the more threads you’re running.</li>
<li>The <strong>host thread</strong> launches the GPU kernel</li>
<li>Threads are organized into blocks</li>
<li>Blocks are organzied into grids</li>
<li>The grid is the kernel you have loaded.</li>
<li>We learned how to launch a kernel for</li>
</ul>
<h1 id="lightning-talks">Lightning Talks</h1>
<ul>
<li>Katy gave a quick lightning talk on <a href="https://drive.google.com/file/d/0ByP1TmlNKprrcGdpaWJyeUZPb3c/edit?usp=sharing" title="Style Guides">style guides</a> for code.</li>
<li>Kelly gave a more in-depth lightning talk on Laser Doppler Vibrometry.</li>
</ul>
Bash Meeting - Katy Huff2014-01-22T00:00:00+00:00https://BIDS.github.io/dats/posts/bash-meeting<h1 id="attending">Attending</h1>
<ul>
<li>Katy Huff</li>
<li>Ryan Bergmann</li>
<li>Professor Rachel Slaybaugh</li>
<li>Kelly Rowland</li>
<li>Daniel Wooten</li>
<li>Christian Disanzo</li>
<li>Jiankai (Jack) Yu</li>
<li>Sandra Bogetic</li>
</ul>
<h1 id="lesson-bash">Lesson: Bash</h1>
<p>Katy will review various features of the powerhouse of programming, the *nix terminal. (Note that *nix is jargon intended to indicate both linux and unix operating systems.) You’ll find this lesson within our shared repository. Start with <a href="https://github.com/thehackerwithin/berkeley/blob/master/bash/tutorial.md" title="Tutorial Source">the tutorial</a>.</p>
First Meeting2014-01-15T00:00:00+00:00https://BIDS.github.io/dats/posts/first-meeting<p>This was a planning meeting. Katy Huff, Ryan Bergmann, and Professor Rachel
Slaybaugh attended. Together we discussed a possible suite of useful software
practices to discuss this semester.</p>
<h1 id="what-is-this">What is this?</h1>
<p>We discussed that part of the purpose of these meetings is to restart a
successful group that originated in Wisconsin, “The Hacker Within.” Ideally,
this meeting will facilitate sharing skills and best practices for
computational nuclear engineering applications. Last semester, we had a couple
such meetings. In spring semester I would like to share a number of skills for
scientific software development (testing, data management, version control,
literate programming etc. ) and to ask the rest of you to share the skills you
have as well. The goal will be to incorporate these practices into our
workflows. This would be a great venue for introducing new libraries, showing
off useful features of a neutronics code you’re using, or bringing up a
computational problem you’re having.</p>
<h1 id="what-can-be-expected">What can be expected?</h1>
<p>We decided to try meetings with an agenda structured thus:</p>
<ul>
<li>First, we will go around the room and attendees can introduce themselves.</li>
<li>The meeting will start with one 30-40 minute talk on a topic of import to
scientists who use software. Particular emphasis is likely to be paid to
topics useful to nuclear engineering researchers. To volunteer to give a
talk, mention it at a meeting, or <a href="mailto:katyhuff@gmail.com">email
Katy.</a></li>
<li>The talk will be followed by a short period for questions.</li>
<li>For up to 40 minutes, attendees will have the opportunity to give lightning
talks on short topics. These may share a small skill snippet, demonstrate a
computational issue you’re having with you’re research, or anything of
interest to the group. Sometimes, lighting talk topics will be requested
ahead of time on a theme (i.e., text editors). To give a lightning talk,
just show up and speak up whe the time comes. If you like, letting Katy
know ahead of time is always welcome.</li>
<li>After the meeting, attendees can hang around in the space and hack together
on their research codes, if they like.</li>
</ul>
<h1 id="what-are-the-topics">What are the topics?</h1>
<p>The topics for the first part of the semester will focus on reproducibility:</p>
<ul>
<li>command line</li>
<li>gpus/cuda</li>
<li>version control</li>
<li>build systems</li>
<li>testing</li>
<li>self documenting code</li>
<li>cloud computing (amazon ec2, etc.)</li>
<li>parallelism</li>
<li>profiling</li>
</ul>
<h1 id="lightning-talks">Lightning Talks</h1>
<p>A number of good topics were identified for lightning talks or talk series. If
you’re interested in talking about these or something else, just come prepared.
If the talk you want to give is in a series, consider banding together a group
of folks who would like to give the other parts of the series.</p>
<ul>
<li>debugging</li>
<li>libraries/linking</li>
<li>scripted plotting</li>
<li>exceptions</li>
<li>text editors</li>
<li>licensing and export control</li>
</ul>