In this assignment you must use graphical methods to answer a series of questions, no statistical analysis is
required. All graphics should be completed using ggplot or mapping tools, where relevant, and where any
manipulations are needed, tidyverse commands must be used. Use all data where available, even if some
cells are missing.Statistical Graphics Assignment
Your assignment must be written in rmarkdown and both the PDF and the .rmd file must be submitted
to the correct, separate assignment 2 links. Be sure to complete this work by yourself. Do not collaborate
with others. Submission of the assessment to turnitin implies that you have conducted the assessment by
yourself in accordance with academic integrity.
To receive marks for the rmarkdown file that you submit, the file should be able to be compiled to produce
your submitted PDF. The rmarkdown file should contain annotations that describe what the commands
included in your file are doing. If you are having trouble compiling your rmarkdown file you can consult
Petra, Houying or Christina for help.
This assignment is worth a total of 80 marks, 70 marks for the graphical analysis and interpretation (the
marks for each question are noted by the question) and 10 marks for the rmarkdown file.
Part 1 [24 marks]
The dataset, scoop_a_poop.csv, contains data from the Scoop a Poop project, a citizen science project
coordinated by scientists from Macquarie University, in collaboration with Taronga Zoo and the University of
The data set has the following columns:

Question 1 [4 marks]
Use a suitable graph to show how many different species had samples of poop collected from them. State
which species’ poop was most abundant.
Question 2 [4 marks]
Use a suitable graphic to display the proportion of each species with positive samples of poop. Which species
has the highest proportion of samples testing positive to antibiotic resistance?
The following commands may be helpful for answering the next few questions:
poop$collection_date<-as.Date(poop$collection_date, “%d/%m/%Y”)
Question 3 [4 marks]
Use a suitable graphic to show the number of negative and positive samples by year. Interpret your plot.
Question 4 [6 marks]
Using a map of Australia, add location of the poop sample observations using different colours for the negative
and positive samples. Provide the map created. Describe three things that can be observed in your map that
relate to the samples.
Question 5 [6 marks]
Focus the map from Question 4 on the State or Territory with the most observations. Determine the best
approach for adding in information about the species. Describe your approach and include the graphic
together with a description of what the graphic shows.
Part 2 [46 marks]
Download the colony.csv data set from the Assignment 2 area and import it into R. This data set shows
data on bee colonies from 47 states of the USA with readings taken at multiple times over a sequence of 7
years. The variables in this data set are described as follows:

These data are available at “https://usda.library.cornell.edu/concern/publications/rn301137d?locale=en”
Question 1 [22 marks]
Thoroughly explore the data using a variety of at least 5 graphics. Describe in detail what each of the plots
shows. Note that more credit will be given to more complex graphics.
Question 2 [24 marks]
Clearly define three research questions related to the bee colonies, then make an analysis graphically and
report your findings in detail. Higher marks will be awarded to well motivated questions and insightful

