Skip to main content Link Search Menu Expand Document (external link)

Assignment 1

Frequently Asked Questions - FAQ

  1. How to find the number of unique words a singer used for part I?

    You can hover over the profile picture of the singer on the scatter plot, and the number would show up. And if you want to select a singer by their name, go to the top-left corner and find a dropdown list there. Select the name of the singer you are interested in.

  2. Does each of us need to collect at least 10 data points for part II?

    No, only at least 10 data points for the whole group, but there must be more than two people collecting the data points.

  3. What kind of plots should we use for our data points?

    Try to look back at the data visualization lecture slides and think about what kind of variables you’re working with. One example is if you have two or more categories that you want to represent the count for, a barplot may be the best way to visualize your data.

  4. For the scatterplots, do we need to show a best fit line or any other expectations?

    No, for the scatterplots on this assignment, make sure you have a clear title and labeled axis. And make sure the data points are spread evenly across. We only want to see the trend of the data points, so no lines or additional modeling is required. But feel free to add it if it helps you.

  5. Are there any resources to research token analysis?

    We won’t go into token analysis too much in this class, but NLP (natural language processing) is a unique form of data analysis that involves categorizing words or phrases. If you would like to learn more about natural language processing this video provides a great breakdown of token analysis with a Python Jupyter Notebook demo.