dplyr Tutorial with examples

dplyr package is used for data manipulation, written by Hadley Wickham. In this tutorial, we are using R built-in dataset “mtcars” dataset for all examples.

Installation of dplyr package

Use install.packages(“package_name”) as shown below.

> install.packages("dplyr") Installing package into ‘C:/Users/pinnapav/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) also installing the dependency ‘rlang’ trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.3/rlang_0.1.2.zip' Content type 'application/zip' length 465839 bytes (454 KB) downloaded 454 KB trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.3/dplyr_0.7.4.zip' Content type 'application/zip' length 2886653 bytes (2.8 MB) downloaded 2.8 MB package ‘rlang’ successfully unpacked and MD5 sums checked package ‘dplyr’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\pinnapav\AppData\Local\Temp\RtmpqWyJWy\downloaded_packages

Load the dplyr package:

library(dplyr)

Load the mtcars dataset:

data(mtcars)

Section1: Subset Observations(Rows) using dplyr package

1. filter: Extract rows that meets logical criteria

Examples:
head(filter(mtcars,disp>220))

mpg cyl disp hp drat wt qsec vs am gear carb 1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 3 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 5 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 6 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3

filter(mtcars,disp>220,hp<120)

mpg cyl disp hp drat wt qsec vs am gear carb 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

2. distinct: Removes duplicate rows

distinct(mtcars)

mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

3. sample_frac: selects fractions of rows.

sample_frac(mtcars,0.5)

mpg cyl disp hp drat wt qsec vs am gear carb Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1

sample_frac(mtcars,0.1)

mpg cyl disp hp drat wt qsec vs am gear carb Ferrari Dino 19.7 6 145.0 175 3.62 2.77 15.5 0 1 5 6 Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2

4. sample_n: Randomly selects n rows.

sample_n(mtcars,5)

mpg cyl disp hp drat wt qsec vs am gear carb Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

sample_n(mtcars,3)

mpg cyl disp hp drat wt qsec vs am gear carb Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Ferrari Dino 19.7 6 145 175 3.62 2.770 15.50 0 1 5 6

5. slice: Selects rows by position.

slice(mtcars,1:11)

mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4

6. top_n: Select and order top n entries.

top_n(mtcars,5,mpg)

mpg cyl disp hp drat wt qsec vs am gear carb 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 4 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 5 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2

7. arrange: Order rows by values of a column(low to high)

head(arrange(mtcars,mpg))

mpg cyl disp hp drat wt qsec vs am gear carb 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 3 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4 4 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4 5 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 6 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8

head(arrange(mtcars,desc(mpg))) # This orders rows by value of a column in descending order(High to low)

mpg cyl disp hp drat wt qsec vs am gear carb 1 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 3 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 4 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 5 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 6 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2

8. data_frame: Combine vectors into data frame.

data_frame(k=6:10,l=11:15)

1 6 11 2 7 12 3 8 13 4 9 14 5 10 15

9. rename: Renames the columns of a data frame.

rename(mtcars,hp=HP)

Section1: Subset Variables(Columns) using dplyr package

select: Select columns by name or helper function.
Functions of select:
contains: Selects columns whose name contains a character string
ends_with: Selects columns whose name ends with a character string
everything: Selects every column
matches: Selects whose name matches a regular expression
num_range: Selects columns named range
one_of: Selects columns whose names are in a group of names
starts_with: Select columns whose names starts with a character string.

Examples:

select(mtcars,hp,mpg)

hp mpg Mazda RX4 110 21.0 Mazda RX4 Wag 110 21.0 Datsun 710 93 22.8 Hornet 4 Drive 110 21.4 Hornet Sportabout 175 18.7 Valiant 105 18.1 Duster 360 245 14.3

head(select(mtcars,contains(“p”)))

mpg disp hp Mazda RX4 21.0 160 110 Mazda RX4 Wag 21.0 160 110 Datsun 710 22.8 108 93 Hornet 4 Drive 21.4 258 110 Hornet Sportabout 18.7 360 175 Valiant 18.1 225 105

Section1: Summarize data using dplyr package

summarise: Summarise data into single row of values.

CheapSexCams
Responses are currently closed, but you can trackback from your own site.

Comments are closed.

Powered by k2schools
%d bloggers like this: