Palmer Penguins

Comparison of Palmer Penguin Species using R

Introduction

This is a practice data analysis for comparison of Palmer Penguin Species and their features using R.

The initial step is to setup the environment for analysis by loading the tidyverse and palmerpenguins packages.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.7      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.0 
## ✔ readr   2.1.2      ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(palmerpenguins)

Summary of Data

Now let’s get an overview of the ‘penguins’ dataset. For that we can use str() and head() functions.

This will the datatypes and the column names along with other details.

str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
head(penguins)
## # A tibble: 6 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## 6 Adelie  Torge…           39.3          20.6              190        3650 male 
## # … with 1 more variable: year <int>

Exploring the data

Now that we have seen an overview of the dataset, we can work on getting some insights on the features of the penguin species.

Let’s determine the maximum and average of flipper length and body mass of each species of penguins

Let’s analyze the flipper lengths first for each species:

flipperlength_summary <- penguins %>%
  group_by(species) %>%
  drop_na() %>%
  summarize(max(flipper_length_mm), min(flipper_length_mm), mean(flipper_length_mm)) %>%
  rename('max' = 'max(flipper_length_mm)', 'min' = 'min(flipper_length_mm)', 'avg' = 'mean(flipper_length_mm)')
flipperlength_summary
## # A tibble: 3 × 4
##   species     max   min   avg
##   <fct>     <int> <int> <dbl>
## 1 Adelie      210   172  190.
## 2 Chinstrap   212   178  196.
## 3 Gentoo      231   203  217.

Now let’s analyze the body mass next:

bodymass_summary <- penguins %>%
  group_by(species) %>%
  drop_na() %>%
  summarize(max(body_mass_g), min(body_mass_g), mean(body_mass_g)) %>%
  rename('max' = 'max(body_mass_g)', 'min' = 'min(body_mass_g)', 'avg' = 'mean(body_mass_g)')
flipperlength_summary
## # A tibble: 3 × 4
##   species     max   min   avg
##   <fct>     <int> <int> <dbl>
## 1 Adelie      210   172  190.
## 2 Chinstrap   212   178  196.
## 3 Gentoo      231   203  217.

Visualize the data

Now that we have analyzed the flipper length and body mass of each species, let’s visualize and compare the two features using a scatter plots.

First let’s create a plot which compares the flipper length and body mass for entire dataset regardless of the species.

ggplot(data = penguins) + geom_point(mapping = aes(x = body_mass_g, y = flipper_length_mm))
## Warning: Removed 2 rows containing missing values (geom_point).

This shows that there is a relation between body mass and flipper length. Let’s see if there is a trend using a simple smooth trend line.

ggplot(data = penguins) + geom_point(mapping = aes(x = body_mass_g, y = flipper_length_mm)) + geom_smooth(mapping = aes(x = body_mass_g, y = flipper_length_mm))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

The line shows there is a relationsship between the features but not exactly a linear relationship. Now let’s compare how the flipper length and body mass is related in each species.

ggplot(data = penguins) + geom_point(mapping = aes(x = body_mass_g, y = flipper_length_mm, color = species))
## Warning: Removed 2 rows containing missing values (geom_point).

From this plot we can identify that the Gentoo species is clearly stands out from other two species with longer flipper length and high body mass.

Let’s look at them seperately in multiple plots

ggplot(data = penguins) + geom_point(mapping = aes(x = body_mass_g, y = flipper_length_mm, color = species)) + facet_wrap(~species)
## Warning: Removed 2 rows containing missing values (geom_point).

Conclusion

From this analysis using the provided data sample, we can infer that Adelie is the smallest penguin species and Gentoo is the largest among the three species. More data samples and further analysis is needed to get a more insight into this relation and determine if there is any direct relation between the features of the penguin species.