DATA 101
  • Getting Started
  • Syllabus
  • Schedule
  • Midterm
  • Final
  • pushpullfork.com

Syllabus

DATA 101, Summer 2018
Kris Shaffer, Ph.D., University of Mary Washington.

Welcome to the gateway course for the data science minor! This course serves two purposes: 1) to give you an overview of what data science entails and a taste of what it can achieve, and 2) to help you build some skills with the tools and environments that data scientists actually use in practice. You’ll see what kinds of problems data science can solve, and get your hands dirty with some real data and state-of-the-art tools. By the end of this course, you should know whether data science is a field you want to explore further (and whether the data science minor might be of interest to you).

General course information

Course title: Introduction to Data Science
Course number: DATA 101
Semester: Spring 2018
Meeting time: TR 11:00am-12:15pm
Meeting location: HCC 327
Instructor: Kris Shaffer, Ph.D.
Office: HCC 410
Office hours: in-person TR 10:00-11:00am, HCC 410, or video chat by appt.
Course website: data101.pushpullfork.com
Online assignment submissions: Canvas

Contacting the instructor

The best way to get in touch with me during the course is via email: kshaffer@umw.edu. Please note that I have limited access to my UMW email on evenings and weekends, so be paitent if you email me after hours. Otherwise, I usually respond within a day or less.

Course description

  • Provide an introduction to data science. Cover a wide range of topics with the goal of providing an overview of the use of data in different fields.
  • Provide hands-on practice with basic tools and methods of data analysis.
  • Prepare students to use data in their field of study and in their work.

Course objectives

  • To learn what the new field of data science has to offer, and what the professionals who engage in it do every day. 
  • To become proficient in using the Python programming environment for data analysis. Students will leave this course with a level of comfort and confidence that enables them to shift their focus from the tool to the application in attacking future problems. 
  • To learn rudimentary programming concepts: variables, branching, functions, loops. (This is similar to what you’d learn in CPSC 110, but with a focus on data analysis more than algorithmic construction.)
  • To encounter, and tame, data “in the wild.” You will understand and be able to recognize the different ways data is stored, organized, and transmitted. You’ll also learn the sad but challenging truth that data normally doesn’t arrive in exactly the form you need it to be in: data scientists usually have to do some work on the front end to “wrangle” it into submission. 
  • To learn basic data exploration techniques: how to textually and graphically inspect a data set to get first impressions, summarize it, understand the scope of what it contains, identify ranges and outliers, identify its possible relationships to other data sets and how it might be merged, form possible hypotheses for more formal modeling, etc. 
  • To get an overview of what the “machine learning” part of data science is all about: making automated predictions and inferences about Big Data.

Required materials

A computer

Make sure you have a computer you can access whenever needed, not a borrowed one. This is essential for an online course in general, and a class on data science specifically. Over the course of the semester, you will need to download and install some free and/or open source software to complete various assignments, and much of the software we will use is not available on UMW lab machines, so make sure you have the necessary access/permissions to do this on a computer to which you have daily access. 

The internet

All readings/videos/activities will be available online. All assigned work will be submitted online. Reliable broadband internet access is a must. If you do not have access to reliable broadband off-campus, plan on doing a fair bit of your course work on the campus network.

Textbooks

  • DataCamp: Data Analyst with Python career track
  • Anaconda Distribution (Python 3.6 version) installed on a computer to which you have administrative access

By joining the class group on DataCamp (see your UMW email for invitation), you will have free access to all DataCamp courses in the Data Analyst career track (and many others!).

Credit and assessment

This course is about growing in your ability to think critically about digital technology and engage it deliberately. That will look different for each person, and to the extent that it can be measured, it rarely involves reproducing existing knowledge or jumping through well-worn academic hoops. The most important and interesting aspects of learning are things that are difficult to assess fairly and reductively (i.e., with a single letter). As a result, heavy emphasis on grades tends to undermine alternative perspectives and stifle creativity — the exact opposite of what a liberal education should do.

And yet, I still have to assign final grades. So in light of the individualized nature of our work, we will use self-assessment to determine final grades.

Weekly self-assessments

Approximately every three weeks (explicit due dates are posted in Canvas), you will submit a self-assessment document to Canvas (a template is provided in Canvas), along with all coursework completed since the previous self-assessment. In that self-assessment, write a response to the following questions (no more than a typical paragraph length total):

  • What did you do this week? (Reference all assigned readings and class activities, as well as anything else you think relevant to the course.)
  • What did you learn this week?
  • What difficulties/challenges did you conquer?
  • Who helped you?
  • Whom did you help?
  • What can I as an instructor do to better support you and your work?

Then for each learning objective listed for the course (see below), assign yourself one of the following assessments: objective met, significant progress made, or no significant progress made. Also provide a one-to-three-sentence explanation for your assessment for each objective, referencing specific work you did and results accomplished. 

Finally, provide yourself what you think an appropriate letter grade would be for your work so far in the course. Take into account all objectives attempted/met since beginning the course.

I will generally respond to your self-assessment within two class days, with feedback on your work, your self-assessment, and your letter-grade estimation. I will also let you know whether or not I "accept" your self-assessment for each learning objective, based on the case that you made. If you disagree with my decision, you have two class days to respond either with a better argument in favor of your assessment, or with additional/revised work, if appropriate. (Another round of revisions/negotiations may take place if necessary, but in my experience it rarely is.)

The goal of this self-assessment process is threefold: to help you 1) grow in your knowledge and skills as a data scientist, 2) stay on track for the grade you want at the end of the course, and 3) grow in your ability to assess your own progress as a data scientist. While self-assessment is an important skill in general, it is of the utmost importance for any of you considering a career in a field like data science. As a data scientist, you will spend much of your time on independent, self-directed work, and the specific skills, techniques, languages, and tasks of a professional data scientist are changing rapidly. The ability to set goals, progress towards them, and assess that progress without a textbook, professor, or even a boss leading you is as important as the technical skillset, and so it is an important part of this course, as well.

Final grades

The departmental grading scale is as follows:

A 92–100%
A– 89–91%
B+ 87–88%
B 82–86%
B– 79–81%
C+ 77–78%
C 72–76%
C– 69–71%
D+ 67–69%
D 60–66%
F 0–59%

Your final course grade will be determined by the percentage of learning objectives met by the end of the course. (Learning objectives still assessed as "significant progress made" at the end of the course will count as half-value.)

Learning outcomes

Following are the learning outcomes for the entire course. Some will apply over multiple units, but all are to be completed by the end of the semester.

Data analysis techniques

  • Data wrangling: identify and convert relevant information into a format that can be used in analysis.
  • Data tidying: convert unstructured or hierarchically structured data into a "tidy" format (two-dimensional table with one observation per row and one measurement/variable per column).
  • Process data accurately in a variety of standard data structure types (lists, dictionaries, data frames, etc.) in Python/Pandas.
  • Calculate the probability of basic events in real-life data.
  • Calculate descriptive statistics on data from a variety of disciplines.
  • Perform basic classification and clustering of datasets.

Data communication/storytelling

  • Use data visualization effectively to support an argument.
  • Present data analysis results in both written and oral communication, in a manner accessible to educated non-experts.
  • Explain how data is used in a wide range of fields, including those that are traditionally part of the liberal arts.

Critically evaluate the results of data analysis.

  • Identify the hidden assumptions of of an existing research project that involves data analysis.
  • Calculate and interpret the accuracy of multiple different analysis techniques.

General Education

  • Demonstrate an ability to interpret quantitative/symbolic information.
  • Convert unstructured data into various mathematical/machine-readable formats.
  • Apply analytical techniques or rules to solve specific problems in a variety of contexts.
  • Express how analytical techniques or rules are used to address real-world problems across multiple disciplines.

Schedule and topics

See the course schedule for a detailed list of topics, assignments, and due dates.

About this syllabus

This syllabus is a summary of course objectives and content and a reminder of some relevant university policies, not a contract. All information in this syllabus (except for the "General course description") is subject to change, with sufficient advanced notice provided by the instructor.

In the spirit of collaboration at the center of this course, changes to this syllabus may be considered during the semester if proposed by a group of students, and approved by general consensus of the students and the instructor.


Student support resources

Accommodations/Disability Resources

The Office of Disability Resources (ODR) has been designated by the University as the primary office to guide, counsel, and assist students with disabilities. If you receive services through the Office of Disability Services and require accommodations for this class, please make an appointment with me as soon as possible to discuss your approved accommodation needs. Bring your accommodation letter with you to the appointment. I will hold any information you share with me in strictest confidence unless you give me permission to do otherwise.

If you have not made contact with the Office of Disability Resources and have reasonable accommodation needs, I will be happy to help you contact them. The office will require appropriate documentation of a disability.

  • Phone: 540-654-1266
  • Website: http://academics.umw.edu/disability
  • Office Location: Lee Hall, Room 401

UMW Writing Center

The UMW Writing Center offers assistance on all types of writing projects: reports, papers, cover letters and resumes, research projects, and citations. The Writing Center can also help you prepare for in-class essay exams and for standardized tests that include essays such as the Praxis I writing exam.

If you are an online, commuter or Stafford Campus student, you can schedule online or face-to-face appointments. Please ensure you are choosing the appropriate appointment type and date.

  • Phone: 540-654-5653
  • Website: https://universityofmarywashington.fullslate.com
  • Office Location: Hurley Convergence Center (HCC), Room 430

UMW Libraries

UMW Libraries have both a physical and online presence. The physical locations are: the Stafford Campus Library on UMW’s Stafford campus and the Simpson Library on the Fredericksburg campus. Both libraries are open to UMW students, and librarians are available to assist you via phone, email, chat message, or face-to-face.

UMW Libraries offers online databases, research guides, and e-books that are accessible off-campus by using your network ID and password. An online interlibrary loan service is also available so that students can request articles and books not available in the collections of UMW Libraries

  • Website: http://libraries.umw.edu/
  • Research Guides: http://libguides.umw.edu/
  • Stafford Campus Library: 540-286-8025, stafflib@umw.edu
  • Simpson Library: 540-654-1148, refdesk@umw.edu
  • Hours: http://libraries.umw.edu/hours-and-directions/

Help Desk/Computer Problems

If you are having difficulties with Canvas or connecting to online University resources, seek assistance from the Help Desk:

  • Call 540‐654‐2255 or leave a voicemail
  • Send an email message to: helpdesk@umw.edu
  • Submit your problem via online form: http://technology.umw.edu/helpdesk/submit-a-service-request/
  • Website (with operating hours): https://technology.umw.edu/helpdesk/

Digital Knowledge Center (DKC)

The Digital Knowledge Center (DKC) provides UMW students with peer tutoring on digital projects and assignments. Any student at the University can take advantage of the Center’s services by scheduling an appointment to work one-one-one or in a group with a student tutor; when a tutor is available, the Center also provides walk-in assistance. Tutorials can cover a wide-range of topics related to common digital systems, technologies, new media, and tools used in courses at UMW; the Center also provides training to students interested in learning how to use the Advanced Media Production Studio (HCC 115). DKC tutors adhere to the UMW Honor Code in all tutorials; they are available to provide guidance and advice, but they cannot create, produce, or edit work on a student’s behalf.

  • Website: http://dkc.umw.edu/
  • Email: info@dkc.umw.edu
  • Phone: 540-654-5815

Bootstrap Starter Kit was built by Kris Shaffer, based on the United theme for Bootstrap, made by Thomas Park. Find the code on GitHub, released under the MIT License.

Icons from Font Awesome. Web fonts from Google.