Freshman Seminar 154
Center for Digital Humanities
Center for Statistics & Machine Learning
Spring 2019, Thursdays 1:30 – 4:20 PM
Come and explore the wide world of data. Learn to read datasets, first with examples drawn from historical and literary datasets and then by creating your own datasets with classmates. Learn the foundational skills of working with data by reflecting critically on those methods and tools through the lenses of race, class, gender, and power. By Dean’s Date you will join the community of data researchers.
Hearty acknowledgements are due to Jean Bauer.
Table of contents
- Course Policies
- Course Calendar
- Datasets, Collections & Repositories
- Resources for working with data
Course Policies
Please read these policies carefully. All information on this syllabus is subject to change. Any changes will be announced in advance.
Assignments
Weekly readings will explore many perspectives on data. Assignments will include short responses, exercises, reflections, as well as two projects working with data during the middle and end of the term. When applicable, all assignments are due by the start of class.
Readings
All course readings are available online and should be completed before the start of class.
Attendance
Mandatory and crucial. Attendance and active participation are essential components of the learning process. My class is structured to encourage your success through in-class discussions, exercises, and conversations with your peers. If you have difficulty speaking up in class, please talk to me.
Because absences are sometimes unavoidable, you will exchange email addresses with classmates. Keep these in case you need to communicate about missed work in class. I consider it your responsibility to find out from classmates (not me) what took place. Due to the importance of our in-class conversations and exercises, you may miss two classes during the semester, after which one point will be deducted from the participation grade for each missed meeting.
Class Contacts
Name:_______________________ E-mail: _____________________________
Name:_______________________ E-mail: _____________________________
Tardiness
Given the precious little time we have together, it is important that we make full use of every class period. This includes the beginning of class. Consistent tardiness will adversely impact your grade. Being tardy three times will register as one missed class.
Evaluation Policies
The assignments throughout the class will be a mix of collaborative efforts and individual assignments. I will provide feedback on every assignment. While completing any of your work, you are encouraged and welcome to visit office hours. I ask that you wait 24 hours before discussing any grade with me. As the major projects will be a large portion of the course evaluations, I will circulate a more detailed schedule and evaluation rubrics in class. Please note that every major step of the final assignment will require a critical reflection on how it fulfills the project designs, goals, and rubrics. To show that you have read this syllabus thoroughly, please email me with a GIF of a dinosaur to receive extra credit.
Extra Credit
If you attend a campus event or talk relevant to our class, you may submit a one-page reflection on how the experience adds to our class learning. Some of these opportunities will be announced in class. Reflections will count for one third of a participation grade letter.
Readings
This course is built around the readings assigned for each class meeting. Please come to class prepared to talk about the readings, including with regard to the debates, materials, and audiences that each author engages. For more, see this handy guide for “How to Read a Book” by Paul Edwards http://pne.people.si.umich.edu/PDF/howtoread.pdf.
Plagiarism & Academic Honesty
It is your duty to be familiar with the Princeton University Principles of General Conduct and Regulations, along with the Rights, Rules, Responsibilities, 2017 edition www.princeton.edu/pub/rrr/. I quote: “The central purposes of a university are the pursuit of truth, the discovery of new knowledge through scholarship and research, the teaching and general development of students, and the transmission of knowledge and learning to society at large. Free inquiry and free expression within the academic community are indispensable to the achievement of these goals. The freedom to teach and to learn depends upon the creation of appropriate conditions and opportunities on the campus as a whole as well as in classrooms and lecture halls. All members of the academic community share the responsibility for securing and sustaining the general conditions conducive to this freedom.”
Assignments
- Participation 20%
- Weekly exercises 20%
- Proposals 10%
- In-class workshop 10%
- Reflections 10%
- Unessays 30%
- Total 100%
Some advice from past students
- “Manage your time, work on projects and assignments daily, even if only for a little while.”
- “Don’t BS your first attempts and peer projects. The projects that I got the most out of were the ones that I worked really hard on the first stab and my group dug in together. If you put in the effort in the beginning your life will be easier and you will get a better grade.”
- “Just make sure you plan ahead.”
- “Participate in class, otherwise you will be bored.”
- “Work hard on the project rough drafts so that you lighten your load and aren’t up all night the night before something is due.”
- “Don’t wait until the last minute.”
- “Begin the research process early.”
- “Do it.”
Class Session Leaders
Students will co-teach many of our course sessions. You will work in small groups to prepare and lead a class session. This work will count as part of participation grades.
Part of the responsibility of leading a class session will be gaining a greater familiarity with the materials, ideas, and debates around a topic. While I do not expect you to learn everything there is to know about a topic in a few weeks, I will expect that you have thought intensively and expansively about what we might know and discuss.
Timeline:
- 2 weeks prior: meet with me outside of class
- 1 week prior: send out a related reading to the rest of the class
- Tuesday before class: send me your class plans
- Lead your class session
- 1 week after: submit your debrief statement (paper or email)
Responsibilities:
- Find and share an additional reading that relates to the topic of the day.
- Find & share one data source, tool, or application that relates to the topic of the day.
- Take primary responsibility for facilitating class discussion.
- Bring a list of discussion questions (at least 5-6; email to me to print out)
Debrief statement:
After the class is over, please compose a written statement (2+ pages) that debriefs on your experiences and ideas. While you are welcome to engage any range of topics, please make sure to respond to these questions:
- Describe what each group member contributed to the preparation.
- What are the major takeaways from the class session?
What did we leave out of the class session? - What might we have done differently?
Do you have any unanswered questions?
Course Calendar
*Note: URLs are provided, but most readings are easy to find by searching online for the author and title.
Week 1 – What is data?
- Introductions, policies, and planning
Week 2 – Categorization and Classification of Data, part 1
Readings
- Foucault, preface from The Order of Things (Course Site)
- Gitelman and Jackson. “Introduction.” “Raw Data” Is an Oxymoron, (Course Site)
- Mimi Onuoha, “The Library of Missing Datasets” https://github.com/MimiOnuoha/missing-datasets
Week 3 – Categorization and Classification of Data, part 2
Readings
- Bowker and Starr, excerpts from Sorting Things Out: Classification and Its Consequences
- Read the introduction and chapters 1-4
- Note: access is via PUL: https://catalog.princeton.edu/catalog/6245711
Week 4 – Data Curation
Readings
- Rawson & Munoz, “Against Cleaning” http://curatingmenus.org/articles/against-cleaning
- Christopher Groskopf, “The Quartz Guide to Bad Data,” Quartz, December 15, 2015. http://qz.com/572338/the-quartz-guide-to-bad-data/
- Jessica Marie Johnson, “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads.” https://read.dukeupress.edu/social-text/article/36/4%20(137)/57/137032/Markup-BodiesBlack-Life-Studies-and-Slavery-Death
Lab: Tools for processing data (E.g. GitHub, OpenRefine, Breve, Excel, Regex)
Week 5 – Data Visualization
Readings
- Johanna Drucker, “Humanities Approaches to Graphical Display” http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html
- Giorgia Lupi, “Data Humanism, The Revolution will be Visualized.” http://giorgialupi.com/data-humanism-my-manifesto-for-a-new-data-wold/
Lab: Data visualization tools and Unessays
Class session leaders: ______________________________________________________
Week 6 – Data Translation
Readings
Moritz Stefaner, “Data Cuisine” watch the entire video https://truth-and-beauty.net/appearances/talks/eyeo-2016
Catherine D’Ignazio and Lauren F. Klein, “Feminist Data Visualization.” www.kanarinka.com/wp-content/uploads/2015/07/IEEE_Feminist_Data_Visualization.pdf
Lab: Unessays (cont’d)
Week 7 – Networks
Readings
- Healy, “Using Metadata to find Paul Revere” https://kieranhealy.org/blog/archives/2013/06/09/using-metadata-to-find-paul-revere/
- Martin During, “From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources.” The Programming Historian. https://programminghistorian.org/en/lessons/creating-network-diagrams-from-historical-sources
Lab: Social network analysis with Palladio, Gephpi, Cytoscape, or others.
Class session leaders: ______________________________________________________
*Midterm projects are due by the start of class.*
Week 8 – Maps
Readings
- McConchie, “Anatomy of a Web Map” http://maptime.io/anatomy-of-a-web-map/ (warning: this page is very data-heavy)
- Jer Thorp, “In the Map Room.” https://medium.com/@blprnt/in-the-map-room-cd6b06bf2139
Before class, please browse these mapping projects (links embedded):
- The Oak of Jerusalem: Flight, Refuge, and Reconnaissance in the Great Dismal Swamp
- The Princeton & Slavery Project
- Hallowed Grounds: Race, Slavery and the University of Alabama
Lab: Google Maps and Google Earth tutorial, among others.
Class session leaders: ______________________________________________________
Week 9 – Designing with Data
Readings
- Chimero, excerpts from The Shape of Design (Course site)
Lab: Tactics for data & design practices
Class session leaders: ______________________________________________________
Week 10 – Studio workshop
- No reading. We will complete an intensive in-class lab assignment and may be joined by other people from around campus.
Week 11 – Data, Lately
- Rather than pre-assign readings, together we will create a reading list for this day based on the most recent news about data and its consequences.
Week 12 – Data in the World
- In-Class Presentations and closing conversations
All final projects must be submitted by email on Dean’s Day at midnight. Each group member is required to submit the additional reflection essay. We may be joined by guests from around campus.
Datasets Collections & Repositories
- Australian GLAM (Galleries, Libraries, Archives, Museums) datasets
- Awesome OpenAccess Data Projects
- Data Collections and Datasets
- A collection of museum, gallery, library, archive, archaeology and assorted sources for machine-readable data
- Data Is Plural Newsletter Archive
- The Magazine of Early American Datasets (MEAD)
- Library of Congress Labs – experimental tools, art, applications, and visualizations
- African American Digital Projects – http://bit.ly/Black-DH-List
- http://data.gov
- http://datarefuge.org
- Bureau of Labor Statistics https://www.bls.gov/
- NYC Open Data (also in many cities too)
- Open Data Philly
- UN Data
- Google Public Data Directory
- ProPublica DataStore
- CDC Data & Statistics
- Awesome Public Datasets
- Makeover Monday Data Challenges – Datasets
- The European Backpackers Index 2018
- Viz for Social Good (browse around for links to various datasets)
- DataHub Core Datasets
- Yelp Dataset Challenge
- Modern Data Catalog
- Pitchfork Reviews 1999-2019 (warning: not yet in csv)
- Stanford Large Network Dataset Collection
- Various Golden Globe, Nobel, and Oscar Award Datasets
- Bigfoot Field Researchers – Comprehensive Sightings Database
- Kaggle CSV Datasets for competitions
- Newspaper presidential endorsements, 1980-present
- Weapons confiscated at airports by TSA in 2015
- Data from the Pentagon’s surplus-equipment-to-local-law-enforcement program
- Network Repository. An Interactive Scientific Data Repository
- Inside AirBnB Data
- Grand Comics Database
- BuzzFeedNews Data
- The Marvel Universe Social Network
- ICIJ Offshore Leaks Database
- Russian Ads on Facebook for US Elections
- Student Loan Debt Per Graduate by School by State 2017
Resources for working with data
Guides and references
- Data Viz Catalogue
- Tutorials from Storybench – focused on journalism
- ProPublica Data Institute 2017
- Alberto Cairo’s video series on data viz
- ProPublica’s Nerd Guides — on working with data
- https://tabula.technology/
- How to Make Data-Driven Visual Essays (part 1) & (part 2)
Approaching & Refining Data
- OpenRefine is the standard tool to use if your spreadsheet is more than 30-40 rows.
- See this handy OpenRefine tutorial – Data Prep
- WTFcsv
- Breve
Creating Graphs & Charts
- RAW by Density – nicely designed graphs
- Datawrapper – flexible graphs with labels
- Flourish – flexible graphs, easy to embed as html
- Chartbuilder — simple charts
- Chartblocks — simple charts
Tools for building unessays
- Princeton cPanel websites
- Adobe Spark
- ESRI StoryMaps
- Infogram
- GitHub Pages (Suggested theme: Minimal Mistakes)
Timelines
Maps
These tools generate maps with points (note: we find static maps often reach more people)
- Palladio
- Google My Maps (see tutorial here: Intro to Google Maps)
- Tableau Public (See also this tutorial to build a map and use filters)
- Flowmap.blue
- Kepler.gl – (maps tool created at Uber)
- LeafletJS (a little more advanced but super popular)
- MapBox (open source platform)
- MapTime (tons of resources & helpful tutorials)
- ArcGIS / qGIS (advanced)
- HistoryPin
- Collection of “strange maps”
Digital Stories / Interactive Narratives
- Twine (see this tutorial for Twine using data)
Story Maps
(interactive maps that move to a sequence of locations)
Text Analysis
- Voyant (one of the best documented tools: http://docs.voyant-tools.org/)
Social Network Analysis
- Palladio – basic network graphs for quick prototypes
- Gephi – network graphs (note: desktop software with a learning curve)
Geocoding
Also known as converting a list of place names into latitude/longitude
Data Collection / Scraping