routlier
routlier is a package that is built to look for outliers in a dataset. The functions allow a user to look for outliers that are ‘x’ number of deviations away from the mean in the data for a particular column. The number of ‘Outliers’ in a dataset will be returned. Additionally any Outlier value will now be replaced with the word ‘Outlier’ in the dataset.
library(routlier)
routlier::routlier_simple(data = detroit,sd = 1)
## You have 66 outliers in your dataset
## FTP UEMP MAN LIC GR CLEAR WM NMAN GOV
## 1 260.35 Outlier Outlier Outlier Outlier 93.4 Outlier Outlier Outlier
## 2 269.8 7 Outlier Outlier Outlier 88.5 Outlier Outlier Outlier
## 3 272.04 5.2 Outlier Outlier Outlier Outlier Outlier Outlier Outlier
## 4 272.96 4.3 535.8 222.1 Outlier 92 500457 591 150.3
## 5 272.51 3.5 576 301.92 297.65 91 482418 626.1 164.3
## 6 261.34 Outlier 601.7 391.22 367.62 87.4 465029 659.8 179.5
## 7 268.89 4.1 577.3 665.56 616.54 88.3 448267 686.2 187.5
## 8 295.99 3.9 596.9 Outlier Outlier 86.1 432109 699.6 195.4
## 9 319.87 3.6 Outlier 837.6 786.23 79 416533 729.9 210.3
## 10 341.43 7.1 569.3 794.9 713.77 73.9 401518 757.8 Outlier
## 11 Outlier Outlier 548.8 817.74 750.43 Outlier Outlier 755.3 Outlier
## 12 Outlier 7.7 563.4 583.17 Outlier Outlier Outlier Outlier Outlier
## 13 Outlier 6.3 Outlier 709.59 666.5 Outlier Outlier Outlier Outlier
## HE WE HOM ACC ASR
## 1 Outlier Outlier Outlier Outlier 306.18
## 2 3.09 134.02 8.9 Outlier 315.16
## 3 3.23 141.68 Outlier 45.31 277.53
## 4 3.33 147.98 8.89 49.51 Outlier
## 5 3.46 159.85 13.07 Outlier Outlier
## 6 3.6 157.19 14.57 Outlier Outlier
## 7 3.73 155.29 21.36 50.62 286.11
## 8 Outlier 131.75 28.03 51.47 291.59
## 9 4.25 178.74 31.49 49.16 320.39
## 10 4.47 178.3 37.39 45.8 323.03
## 11 Outlier 209.54 Outlier 44.54 357.38
## 12 Outlier Outlier Outlier Outlier Outlier
## 13 Outlier Outlier Outlier 44.17 Outlier
routlier:: Example with both quantitative and qualitative data
Here we will utilize the student dataset that is included in the routlier package. This dataset has both quantitative and qualitative data in it. You can see we have 274 outliers when we set the sd argument equal to 2.
routlier_simple(data = student,sd = 2)
## You have 274 outliers in your dataset
## age Medu Fedu traveltime studytime failures famrel freetime
## 1 18 4 4 2 2 0 4 3
## 2 17 1 1 1 2 0 5 3
## 3 15 1 1 1 2 Outlier 4 3
## 4 15 4 2 1 3 0 3 2
## 5 16 3 3 1 2 0 4 3
## 6 16 4 3 1 2 0 5 4
## 7 16 2 2 1 2 0 4 4
## 8 17 4 4 2 2 0 4 Outlier
## 9 15 3 2 1 2 0 4 2
## 10 15 3 4 1 2 0 5 5
## 11 15 4 4 1 2 0 3 3
## 12 15 2 1 Outlier 3 0 5 2
## 13 15 4 4 1 1 0 4 3
## 14 15 4 3 2 2 0 5 4
## 15 15 2 2 1 3 0 4 5
...
routlier_dt_sd
Using routlier_dt_sd() we can find the Outliers in a dataset and have them highlighted with a green background to make them easier to find. Here we have an interactive DT::table that gets returned. This allows us to filter and sort out data much more easily.
routlier_dt_sd(data = detroit,sd = 1)
## You have 66 outliers in your dataset
routlier_rh_sd
Using routlier_rh_sd() we can find the Outliers in a dataset and have them highlighted with a green background to make them easier to find.
routlier_formattable
Using routlier_formattable() we can find the Outliers in a dataset and have the Outliers highlighted in red and the non-outlier values highlighted in green.
routlier_formattable(data = detroit,sd = 2)
## You have 4 outliers in your dataset
## [1] "WM" "WE" "UEMP" "NMAN" "MAN" "LIC" "HOM" "HE" "GR"
## [10] "GOV" "FTP" "CLEAR" "ASR" "ACC"
## [1] "WM" "WE" "UEMP" "NMAN" "MAN" "LIC" "HOM" "HE" "GR"
## [10] "GOV" "FTP" "CLEAR" "ASR" "ACC"
WM | WE | UEMP | NMAN | MAN | LIC | HOM | HE | GR | GOV | FTP | CLEAR | ASR | ACC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
558724 | 117.18 | 11.0 | 538.1 | 455.5 | 178.50 | 8.60 | 2.98 | 215.98 | 133.9 | 260.35 | 93.4 | 306.18 | 39.17 |
538584 | 134.02 | 7.0 | 547.6 | 480.2 | 156.41 | 8.90 | 3.09 | 180.48 | 137.6 | 269.80 | 88.5 | 315.16 | 40.27 |
519171 | 141.68 | 5.2 | 562.8 | 506.1 | 198.02 | 8.52 | 3.23 | 209.57 | 143.6 | 272.04 | 94.4 | 277.53 | 45.31 |
500457 | 147.98 | 4.3 | 591.0 | 535.8 | 222.10 | 8.89 | 3.33 | 231.67 | 150.3 | 272.96 | 92.0 | 234.07 | 49.51 |
482418 | 159.85 | 3.5 | 626.1 | 576.0 | 301.92 | 13.07 | 3.46 | 297.65 | 164.3 | 272.51 | 91.0 | 230.84 | 55.05 |
465029 | 157.19 | 3.2 | 659.8 | 601.7 | 391.22 | 14.57 | 3.60 | 367.62 | 179.5 | 261.34 | 87.4 | 217.99 | 53.90 |
448267 | 155.29 | 4.1 | 686.2 | 577.3 | 665.56 | 21.36 | 3.73 | 616.54 | 187.5 | 268.89 | 88.3 | 286.11 | 50.62 |
432109 | 131.75 | 3.9 | 699.6 | 596.9 | 1131.21 | 28.03 | 2.91 | 1029.75 | 195.4 | 295.99 | 86.1 | 291.59 | 51.47 |
416533 | 178.74 | 3.6 | 729.9 | 613.5 | 837.60 | 31.49 | 4.25 | 786.23 | 210.3 | 319.87 | 79.0 | 320.39 | 49.16 |
401518 | 178.30 | 7.1 | 757.8 | 569.3 | 794.90 | 37.39 | 4.47 | 713.77 | 223.8 | 341.43 | 73.9 | 323.03 | 45.80 |
387046 | 209.54 | 8.4 | 755.3 | 548.8 | 817.74 | 46.26 | 5.04 | 750.43 | 227.7 | 356.59 | 63.4 | 357.38 | 44.54 |
373095 | 240.05 | 7.7 | 787.0 | 563.4 | 583.17 | 47.24 | 5.47 | 1027.38 | 230.9 | 376.69 | 62.5 | 422.07 | 41.03 |
359647 | 258.05 | 6.3 | 819.8 | 609.3 | 709.59 | 52.33 | 5.76 | 666.50 | 230.2 | 390.19 | 58.9 | 473.01 | 44.17 |
routlier_mad
Using routlier_mad() we can find the Outliers in a dataset and have the Outliers highlighted in red and the non-outlier values highlighted in green.
routlier_mad(data = detroit,MAD = 3)
## [1] "The MAD for column 1 is from: 329.046758 : 216.873242 and the overall MAD range is: 112.173516"
## [1] "The MAD for column 2 is from: 12.76126 : -2.36126 and the overall MAD range is: 15.12252"
## [1] "The MAD for column 3 is from: 713.40872 : 425.19128 and the overall MAD range is: 288.217440000001"
## [1] "The MAD for column 4 is from: 1714.823754 : -548.483754 and the overall MAD range is: 2263.307508"
## [1] "The MAD for column 5 is from: 2034.898942 : -801.818942 and the overall MAD range is: 2836.717884"
## [1] "The MAD for column 6 is from: 114.0868 : 60.7132 and the overall MAD range is: 53.3736"
## [1] "The MAD for column 7 is from: 680397.682 : 216136.318 and the overall MAD range is: 464261.364"
## [1] "The MAD for column 8 is from: 1004.66248 : 367.73752 and the overall MAD range is: 636.924959999999"
## [1] "The MAD for column 9 is from: 352.95816 : 22.0418400000001 and the overall MAD range is: 330.91632"
## [1] "The MAD for column 10 is from: 6.357636 : 0.842363999999999 and the overall MAD range is: 5.515272"
## [1] "The MAD for column 11 is from: 253.04009 : 61.3399099999999 and the overall MAD range is: 191.70018"
## [1] "The MAD for column 12 is from: 76.824066 : -34.104066 and the overall MAD range is: 110.928132"
## [1] "The MAD for column 13 is from: 67.016006 : 24.583994 and the overall MAD range is: 42.432012"
## [1] "The MAD for column 14 is from: 433.60947 : 178.75053 and the overall MAD range is: 254.85894"
## [1] "You have a total of 7 Outliers in your dataset"
## $outliers
## [1] 7
##
## $outlier_table
routlier_quantile
Using routlier_quantile() we can find the Outliers in a dataset and have the Outliers highlighted in red and the non-outlier values highlighted in green. We can look at either M mild outlier or E extreme outliers.
This approach utilizes the Tukey method by looking at the quantile ranges utilizing both the upper quartile and lower quartile ranges.
routlier_quantile(data = mtcars[1:10,],type = 7,outlier_type = "M")
## [1] "The IQR for column 1 is from: 27.8875 : 13.3875 and the overall IQR range is: 14.5"
## [1] "The IQR for column 2 is from: 8.25 : 2.25 and the overall IQR range is: 6"
## [1] "The IQR for column 3 is from: 399.3375 : 0.437499999999943 and the overall IQR range is: 398.9"
## [1] "The IQR for column 4 is from: 153.125 : 64.125 and the overall IQR range is: 89"
## [1] "The IQR for column 5 is from: 5.0025 : 2.0625 and the overall IQR range is: 2.94"
## [1] "The IQR for column 6 is from: 4.184375 : 2.199375 and the overall IQR range is: 1.985"
## [1] "The IQR for column 7 is from: 24.12 : 12.76 and the overall IQR range is: 11.36"
## [1] "The IQR for column 8 is from: 2.5 : -1.5 and the overall IQR range is: 4"
## [1] "The IQR for column 9 is from: 1.875 : -1.125 and the overall IQR range is: 3"
## [1] "The IQR for column 10 is from: 5.5 : 1.5 and the overall IQR range is: 4"
## [1] "The IQR for column 11 is from: 8.125 : -2.875 and the overall IQR range is: 11"
## [1] "You have a total of 3 Outliers in your dataset"
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |