Title: | Compute Scagnostics on Pairs of Numeric Variables in a Data Set |
---|---|
Description: | Computes a range of scatterplot diagnostics (scagnostics) on pairs of numerical variables in a data set. A range of scagnostics, including graph and association-based scagnostics described by Leland Wilkinson and Graham Wills (2008) <doi:10.1198/106186008X320465> and association-based scagnostics described by Katrin Grimm (2016,ISBN:978-3-8439-3092-5) can be computed. Summary and plotting functions are provided. |
Authors: | Harriet Mason [aut, cre] , Stuart Lee [aut] , Ursula Laa [aut] , Dianne Cook [aut] |
Maintainer: | Harriet Mason <[email protected]> |
License: | GPL-3 |
Version: | 2.0.2 |
Built: | 2024-11-12 06:13:10 UTC |
Source: | https://github.com/numbats/cassowaryr |
All variables and pairs of variables have same summary statistics but are very different data, as can be seen by visualisation.
A tibble with 44 observations and 3 variables
label of the data set, each set has 11 observations
variable for horizontal axis
variable for vertical axis
Compute selected scagnostics on subsets
calc_scags( x, y, scags = c("outlying", "stringy", "striated", "striated2", "clumpy", "clumpy2", "sparse", "skewed", "convex", "skinny", "monotonic", "splines", "dcor"), out.rm = TRUE, euclid = FALSE )
calc_scags( x, y, scags = c("outlying", "stringy", "striated", "striated2", "clumpy", "clumpy2", "sparse", "skewed", "convex", "skinny", "monotonic", "splines", "dcor"), out.rm = TRUE, euclid = FALSE )
x |
numeric vector |
y |
numeric vector |
scags |
collection of strings matching names of scagnostics to calculate: outlying, stringy, striated, striated2, striped, clumpy, clumpy2, sparse, skewed, convex, skinny, monotonic, splines, dcor |
out.rm |
logical indicator to indicate if outliers should be removed before calculating non outlying measures |
euclid |
logical indicator to use Euclidean distance |
A data frame that gives the single plot's scagnostic score.
calc_scags_wide
# Calculate selected scagnostics on a single pair calc_scags(anscombe$x1, anscombe$y1, scags=c("monotonic", "outlying")) # Compute on long form data, or subsets # defined by a categorical variable require(dplyr) datasaurus_dozen %>% group_by(dataset) %>% summarise(calc_scags(x,y, scags=c("monotonic", "outlying", "convex")))
# Calculate selected scagnostics on a single pair calc_scags(anscombe$x1, anscombe$y1, scags=c("monotonic", "outlying")) # Compute on long form data, or subsets # defined by a categorical variable require(dplyr) datasaurus_dozen %>% group_by(dataset) %>% summarise(calc_scags(x,y, scags=c("monotonic", "outlying", "convex")))
Compute scagnostics on all possible scatter plots for the given data
calc_scags_wide( all_data, scags = c("outlying", "stringy", "striated", "striated2", "clumpy", "clumpy2", "sparse", "skewed", "convex", "skinny", "monotonic", "splines", "dcor"), out.rm = TRUE, euclid = FALSE )
calc_scags_wide( all_data, scags = c("outlying", "stringy", "striated", "striated2", "clumpy", "clumpy2", "sparse", "skewed", "convex", "skinny", "monotonic", "splines", "dcor"), out.rm = TRUE, euclid = FALSE )
all_data |
tibble of multivariate data on which to compute scagnostics |
scags |
collection of strings matching names of scagnostics to calculate: outlying, stringy, striated, striated2, striped, clumpy, clumpy2, sparse, skewed, convex, skinny, monotonic, splines, dcor |
out.rm |
logical indicator to indicate if outliers should be removed before calculating non outlying measures |
euclid |
logical indicator to use Euclidean distance |
A data frame that gives the data's scagnostic scores for each possible variable combination.
calc_scags
# Calculate selected scagnostics data(pk) calc_scags_wide(pk[,2:5], scags=c("outlying","monotonic"))
# Calculate selected scagnostics data(pk) calc_scags_wide(pk[,2:5], scags=c("outlying","monotonic"))
From the datasauRus package. A modern update of Anscombe. All plots have same x and y mean, variance and correlation, but look different visually.
All variables and pairs of variables have same summary statistics but are very different data, as can be seen by visualisation.
A tibble with 1,846 observations and 3 variables
label of data set
variable for horizontal axis
variable for vertical axis
A tibble with 142 observations and 26 variables
x and y variables for away data
x and y variables for bullseye data
x and y variables for circle data
x and y variables for dino data
x and y variables for dots data
x and y variables for h_lines data
x and y variables for high_lines data
x and y variables for slant_down data
x and y variables for slant_up data
x and y variables for star data
x and y variables for v_lines data
x and y variables for wide_lines data
x and y variables for star data
x and y variables for x_shape data
This function will draw the alphahull for a scatterplot.
draw_alphahull(x, y, alpha = 0.5, clr = "black", fill = FALSE, out.rm = TRUE)
draw_alphahull(x, y, alpha = 0.5, clr = "black", fill = FALSE, out.rm = TRUE)
x |
numeric vector |
y |
numeric vector |
alpha |
transparency value of points |
clr |
optional colour of points and lines, default black |
fill |
Fill the polygon |
out.rm |
option to return the outlier removed alphahull |
A alphahull::ahull(del, alpha = alpha) "gg" object that draws the plot's alpha hull.
require(dplyr) require(ggplot2) require(alphahull) data("features") nl <- features %>% filter(feature == "clusters") draw_alphahull(nl$x, nl$y)
require(dplyr) require(ggplot2) require(alphahull) data("features") nl <- features %>% filter(feature == "clusters") draw_alphahull(nl$x, nl$y)
This function will draw the Convex Hull for a scatterplot.
draw_convexhull(x, y, alpha = 0.5, clr = "black", fill = FALSE, out.rm = TRUE)
draw_convexhull(x, y, alpha = 0.5, clr = "black", fill = FALSE, out.rm = TRUE)
x |
numeric vector |
y |
numeric vector |
alpha |
transparency value of points |
clr |
optional colour of points and lines, default black |
fill |
Fill the polygon |
out.rm |
option to return the outlier removed convex hull |
A "gg" object that draws the plot's convex hull.
require(dplyr) require(ggplot2) data("features") nl <- features %>% filter(feature == "clusters") draw_convexhull(nl$x, nl$y, fill=TRUE, out.rm=FALSE)
require(dplyr) require(ggplot2) data("features") nl <- features %>% filter(feature == "clusters") draw_convexhull(nl$x, nl$y, fill=TRUE, out.rm=FALSE)
This function will draw the MST for a scatterplot.
draw_mst(x, y, alpha = 0.5, out.rm = TRUE)
draw_mst(x, y, alpha = 0.5, out.rm = TRUE)
x |
numeric vector |
y |
numeric vector |
alpha |
The alpha value used to build the graph object. Larger values allow points further apart to be connected. |
out.rm |
option to return the outlier removed MST |
A "gg" object that draws the plot's MST.
require(dplyr) require(ggplot2) data("features") nl <- features %>% filter(feature == "nonlinear2") draw_mst(nl$x, nl$y)
require(dplyr) require(ggplot2) data("features") nl <- features %>% filter(feature == "nonlinear2") draw_mst(nl$x, nl$y)
Simulated data with common features that might be seen in 2D data. Variable are feature, x, y.
A tibble with 1,013 observations and 3 variables, and 15 different patterns
label of data set
variable for horizontal axis
variable for vertical axis
There are 7 variables (x1-x7) and 2,100 observations. Variables 4 and 7 have the numbat. The rest are noise. Group A has the numbat, and group B is all noise.
Biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.
A tibble with 1,013 observations and 3 variables
ASCII subject name and recording number
MDVP:Fo(Hz)
Average vocal fundamental frequency
MDVP:Fhi(Hz)
Maximum vocal fundamental frequency
MDVP:Flo(Hz)
Minimum vocal fundamental frequency
MDVP:Jitter
,MDVP:Jitter(Abs)
,MDVP:RAP
,MDVP:PPQ
,Jitter:DDP
Several measures of variation in fundamental frequency
MDVP:Shimmer
,MDVP:Shimmer(dB)
,Shimmer:APQ3
,Shimmer:APQ5
,MDVP:APQ
,Shimmer:DDA
Several measures of variation in amplitude
NHR
,HNR
Two measures of ratio of noise to tonal components in the voice
status
Health status of the subject (one) - Parkinson's, (zero) - healthy
RPDE
,D2
Two nonlinear dynamical complexity measures
DFA
Signal fractal scaling exponent
spread1
,spread2
,PPE
Three nonlinear measures of fundamental frequency variation
The data is available at The UCI Machine Learning Repository in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.
The data are originally analysed in: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering.
Compute clumpy scagnostic measure using MST
sc_clumpy(x, y) ## Default S3 method: sc_clumpy(x, y) ## S3 method for class 'scree' sc_clumpy(x, y = NULL) ## S3 method for class 'igraph' sc_clumpy(x, y)
sc_clumpy(x, y) ## Default S3 method: sc_clumpy(x, y) ## S3 method for class 'scree' sc_clumpy(x, y = NULL) ## S3 method for class 'igraph' sc_clumpy(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's clumpy score.
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(clumpy = sc_clumpy(x,y)) sc_clumpy(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(clumpy = sc_clumpy(x,y)) sc_clumpy(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
Compute robust clumpy scagnostic measure using MST
sc_clumpy_r(x, y) ## Default S3 method: sc_clumpy_r(x, y) ## S3 method for class 'scree' sc_clumpy_r(x, y = NULL) ## S3 method for class 'igraph' sc_clumpy_r(x, y)
sc_clumpy_r(x, y) ## Default S3 method: sc_clumpy_r(x, y) ## S3 method for class 'scree' sc_clumpy_r(x, y = NULL) ## S3 method for class 'igraph' sc_clumpy_r(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's robust clumpy score.
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(clumpy = sc_clumpy_r(x,y)) sc_clumpy_r(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(clumpy = sc_clumpy_r(x,y)) sc_clumpy_r(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
Compute adjusted clumpy measure using MST
sc_clumpy2(x, y) ## Default S3 method: sc_clumpy2(x, y) ## S3 method for class 'scree' sc_clumpy2(x, y = NULL) ## S3 method for class 'igraph' sc_clumpy2(x, y)
sc_clumpy2(x, y) ## Default S3 method: sc_clumpy2(x, y) ## S3 method for class 'scree' sc_clumpy2(x, y = NULL) ## S3 method for class 'igraph' sc_clumpy2(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's clumpy2 score.
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(clumpy = sc_clumpy2(x,y)) sc_clumpy2(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(clumpy = sc_clumpy2(x,y)) sc_clumpy2(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
Compute convex scagnostic measure
sc_convex(x, y) ## Default S3 method: sc_convex(x, y) ## S3 method for class 'scree' sc_convex(x, y = NULL) ## S3 method for class 'list' sc_convex(x, y)
sc_convex(x, y) ## Default S3 method: sc_convex(x, y) ## S3 method for class 'scree' sc_convex(x, y = NULL) ## S3 method for class 'list' sc_convex(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's convex score.
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(convex = sc_convex(x,y)) sc_convex(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(convex = sc_convex(x,y)) sc_convex(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
(Taken from tourr package) Computes the distance correlation based index on 2D projections of the data.
sc_dcor(x, y)
sc_dcor(x, y)
x |
numeric vector |
y |
numeric vector |
A "numeric" object that gives the plot's dcor score.
require(ggplot2) require(tidyr) require(dplyr) data(anscombe) anscombe_tidy <- anscombe %>% pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)") ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_dcor(anscombe$x1, anscombe$y1) sc_dcor(anscombe$x2, anscombe$y2) sc_dcor(anscombe$x3, anscombe$y3) sc_dcor(anscombe$x4, anscombe$y4)
require(ggplot2) require(tidyr) require(dplyr) data(anscombe) anscombe_tidy <- anscombe %>% pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)") ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_dcor(anscombe$x1, anscombe$y1) sc_dcor(anscombe$x2, anscombe$y2) sc_dcor(anscombe$x3, anscombe$y3) sc_dcor(anscombe$x4, anscombe$y4)
Measure of Spearman Correlation
sc_monotonic(x, y)
sc_monotonic(x, y)
x |
numeric vector |
y |
numeric vector |
A "numeric" object that gives the plot's monotonic score.
require(ggplot2) require(tidyr) require(dplyr) data(anscombe) anscombe_tidy <- anscombe %>% pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)") ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_monotonic(anscombe$x1, anscombe$y1) sc_monotonic(anscombe$x2, anscombe$y2) sc_monotonic(anscombe$x3, anscombe$y3) sc_monotonic(anscombe$x4, anscombe$y4)
require(ggplot2) require(tidyr) require(dplyr) data(anscombe) anscombe_tidy <- anscombe %>% pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)") ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_monotonic(anscombe$x1, anscombe$y1) sc_monotonic(anscombe$x2, anscombe$y2) sc_monotonic(anscombe$x3, anscombe$y3) sc_monotonic(anscombe$x4, anscombe$y4)
Compute outlying scagnostic measure using MST
sc_outlying(x, y) ## Default S3 method: sc_outlying(x, y) ## S3 method for class 'scree' sc_outlying(x, y = NULL) ## S3 method for class 'igraph' sc_outlying(x, y)
sc_outlying(x, y) ## Default S3 method: sc_outlying(x, y) ## S3 method for class 'scree' sc_outlying(x, y = NULL) ## S3 method for class 'igraph' sc_outlying(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's outlying score.
require(ggplot2) require(tidyr) require(dplyr) ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point() + facet_wrap(~dataset, ncol=3, scales = "free") sc_outlying(datasaurus_dozen_wide$dino_x, datasaurus_dozen_wide$dino_y) sc_outlying(datasaurus_dozen_wide$dots_x, datasaurus_dozen_wide$dots_y) sc_outlying(datasaurus_dozen_wide$h_lines_x, datasaurus_dozen_wide$h_lines_y)
require(ggplot2) require(tidyr) require(dplyr) ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point() + facet_wrap(~dataset, ncol=3, scales = "free") sc_outlying(datasaurus_dozen_wide$dino_x, datasaurus_dozen_wide$dino_y) sc_outlying(datasaurus_dozen_wide$dots_x, datasaurus_dozen_wide$dots_y) sc_outlying(datasaurus_dozen_wide$h_lines_x, datasaurus_dozen_wide$h_lines_y)
Compute skewed scagnostic measure using MST
sc_skewed(x, y) ## Default S3 method: sc_skewed(x, y) ## S3 method for class 'scree' sc_skewed(x, y = NULL) ## S3 method for class 'igraph' sc_skewed(x, y)
sc_skewed(x, y) ## Default S3 method: sc_skewed(x, y) ## S3 method for class 'scree' sc_skewed(x, y = NULL) ## S3 method for class 'igraph' sc_skewed(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's skewed score.
require(ggplot2) require(tidyr) require(dplyr) data(anscombe_tidy) ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point() + facet_wrap(~dataset, ncol=3, scales = "free") sc_skewed(datasaurus_dozen_wide$dots_x, datasaurus_dozen_wide$dots_y) sc_skewed(datasaurus_dozen_wide$h_lines_x, datasaurus_dozen_wide$h_lines_y) sc_skewed(datasaurus_dozen_wide$x_shape_x, datasaurus_dozen_wide$x_shape_y)
require(ggplot2) require(tidyr) require(dplyr) data(anscombe_tidy) ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point() + facet_wrap(~dataset, ncol=3, scales = "free") sc_skewed(datasaurus_dozen_wide$dots_x, datasaurus_dozen_wide$dots_y) sc_skewed(datasaurus_dozen_wide$h_lines_x, datasaurus_dozen_wide$h_lines_y) sc_skewed(datasaurus_dozen_wide$x_shape_x, datasaurus_dozen_wide$x_shape_y)
Compute skinny scagnostic measure
sc_skinny(x, y) ## Default S3 method: sc_skinny(x, y) ## S3 method for class 'scree' sc_skinny(x, y = NULL) ## S3 method for class 'list' sc_skinny(x, y = NULL)
sc_skinny(x, y) ## Default S3 method: sc_skinny(x, y) ## S3 method for class 'scree' sc_skinny(x, y = NULL) ## S3 method for class 'list' sc_skinny(x, y = NULL)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's skinny score.
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(skinny = sc_skinny(x,y)) sc_skinny(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(skinny = sc_skinny(x,y)) sc_skinny(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
Compute sparse scagnostic measure using MST
sc_sparse(x, y) ## Default S3 method: sc_sparse(x, y) ## S3 method for class 'scree' sc_sparse(x, y = NULL) ## S3 method for class 'igraph' sc_sparse(x, y)
sc_sparse(x, y) ## Default S3 method: sc_sparse(x, y) ## S3 method for class 'scree' sc_sparse(x, y = NULL) ## S3 method for class 'igraph' sc_sparse(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's sparse score.
require(ggplot2) require(tidyr) require(dplyr) ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point() + facet_wrap(~dataset, ncol=3, scales = "free") sc_sparse(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y) sc_sparse(datasaurus_dozen_wide$circle_x, datasaurus_dozen_wide$circle_y) sc_sparse(datasaurus_dozen_wide$dino_x, datasaurus_dozen_wide$dino_y)
require(ggplot2) require(tidyr) require(dplyr) ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point() + facet_wrap(~dataset, ncol=3, scales = "free") sc_sparse(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y) sc_sparse(datasaurus_dozen_wide$circle_x, datasaurus_dozen_wide$circle_y) sc_sparse(datasaurus_dozen_wide$dino_x, datasaurus_dozen_wide$dino_y)
Compute adjusted sparse measure using the alpha hull
sc_sparse2(x, y) ## Default S3 method: sc_sparse2(x, y) ## S3 method for class 'scree' sc_sparse2(x, y = NULL) ## S3 method for class 'list' sc_sparse2(x, y = NULL)
sc_sparse2(x, y) ## Default S3 method: sc_sparse2(x, y) ## S3 method for class 'scree' sc_sparse2(x, y = NULL) ## S3 method for class 'list' sc_sparse2(x, y = NULL)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's sparse2 score.
require(ggplot2) require(tidyr) require(dplyr) data(anscombe_tidy) ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_sparse2(anscombe$x1, anscombe$y1)
require(ggplot2) require(tidyr) require(dplyr) data(anscombe_tidy) ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_sparse2(anscombe$x1, anscombe$y1)
(Taken from tourr git repo) Compares the variance in residuals of a fitted spline model to the overall variance to find functional dependence in 2D projections of the data.
sc_splines(x, y)
sc_splines(x, y)
x |
numeric vector |
y |
numeric vector |
A "numeric" object that gives the plot's spines score.
require(ggplot2) require(tidyr) require(dplyr) data(anscombe) anscombe_tidy <- anscombe %>% pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)") ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_splines(anscombe$x1, anscombe$y1) sc_splines(anscombe$x2, anscombe$y2) sc_splines(anscombe$x3, anscombe$y3)
require(ggplot2) require(tidyr) require(dplyr) data(anscombe) anscombe_tidy <- anscombe %>% pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)") ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_splines(anscombe$x1, anscombe$y1) sc_splines(anscombe$x2, anscombe$y2) sc_splines(anscombe$x3, anscombe$y3)
Compute striated scagnostic measure using MST
sc_striated(x, y) ## Default S3 method: sc_striated(x, y) ## S3 method for class 'scree' sc_striated(x, y = NULL) ## S3 method for class 'igraph' sc_striated(x, y)
sc_striated(x, y) ## Default S3 method: sc_striated(x, y) ## S3 method for class 'scree' sc_striated(x, y = NULL) ## S3 method for class 'igraph' sc_striated(x, y)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's striated score.
require(ggplot2) require(dplyr) data(anscombe_tidy) ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_striated(anscombe$x1, anscombe$y1) sc_striated(anscombe$x2, anscombe$y2)
require(ggplot2) require(dplyr) data(anscombe_tidy) ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_striated(anscombe$x1, anscombe$y1) sc_striated(anscombe$x2, anscombe$y2)
Compute angle adjusted striated measure using MST
sc_striated2(x, y) ## Default S3 method: sc_striated2(x, y) ## S3 method for class 'scree' sc_striated2(x, y = NULL) ## S3 method for class 'igraph' sc_striated2(x, y)
sc_striated2(x, y) ## Default S3 method: sc_striated2(x, y) ## S3 method for class 'scree' sc_striated2(x, y = NULL) ## S3 method for class 'igraph' sc_striated2(x, y)
x |
numeric vector of x values, or an MST object |
y |
numeric vector of y values, or a scree object |
A "numeric" object that gives the plot's striated2 score.
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(striated = sc_striated2(x,y)) sc_striated2(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
require(ggplot2) require(dplyr) ggplot(features, aes(x=x, y=y)) + geom_point() + facet_wrap(~feature, ncol = 5, scales = "free") features %>% group_by(feature) %>% summarise(striated = sc_striated2(x,y)) sc_striated2(datasaurus_dozen_wide$away_x, datasaurus_dozen_wide$away_y)
Compute stringy scagnostic measure using MST
sc_stringy(x, y) ## Default S3 method: sc_stringy(x, y) ## S3 method for class 'scree' sc_stringy(x, y = NULL) ## S3 method for class 'igraph' sc_stringy(x, y = NULL)
sc_stringy(x, y) ## Default S3 method: sc_stringy(x, y) ## S3 method for class 'scree' sc_stringy(x, y = NULL) ## S3 method for class 'igraph' sc_stringy(x, y = NULL)
x |
numeric vector of x values |
y |
numeric vector of y values |
A "numeric" object that gives the plot's stringy score.
require(ggplot2) require(tidyr) require(dplyr) data(anscombe_tidy) ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_stringy(anscombe$x1, anscombe$y1) sc_stringy(anscombe$x2, anscombe$y2) sc_stringy(anscombe$x3, anscombe$y3) sc_stringy(anscombe$x4, anscombe$y4)
require(ggplot2) require(tidyr) require(dplyr) data(anscombe_tidy) ggplot(anscombe_tidy, aes(x=x, y=y)) + geom_point() + facet_wrap(~set, ncol=2, scales = "free") sc_stringy(anscombe$x1, anscombe$y1) sc_stringy(anscombe$x2, anscombe$y2) sc_stringy(anscombe$x3, anscombe$y3) sc_stringy(anscombe$x4, anscombe$y4)
This metric computes the 1-(ratio between the number of unique values to total data values) on number of rotations of the data, and returns the smallest value. If this value is large it means that there are only a few unique data values, and hence the distribution is discrete
sc_striped(x, y)
sc_striped(x, y)
x |
numeric vector |
y |
numeric vector |
double
data("datasaurus_dozen_wide") sc_striped(datasaurus_dozen_wide$v_lines_x, datasaurus_dozen_wide$v_lines_y) sc_striped(datasaurus_dozen_wide$dino_x, datasaurus_dozen_wide$dino_y)
data("datasaurus_dozen_wide") sc_striped(datasaurus_dozen_wide$v_lines_x, datasaurus_dozen_wide$v_lines_y) sc_striped(datasaurus_dozen_wide$dino_x, datasaurus_dozen_wide$dino_y)
Pre-processing to generate scagnostic measures
scree(x, y, binner = NULL, ...)
scree(x, y, binner = NULL, ...)
x , y
|
numeric vectors |
binner |
an optional function that bins the x and y vectors prior to triangulation |
... |
other args |
An object of class "scree" that consists of three elements:
del
: the Delauney-Voronoi tesselation from alphahull::delvor()
weights
: the lengths of each edge in the Delauney triangulation
alpha
: the radius or alpha
value that will be used to generate the
alphahull
x <- runif(100) y <- runif(100) scree(x,y)
x <- runif(100) y <- runif(100) scree(x,y)
Calculate the top scagnostic for each pair of variables
top_pairs(scags_data)
top_pairs(scags_data)
scags_data |
A dataset of scagnostic values that was returned by calc_scags or calc_scags_wide |
A data frame where each row is a each scatter plot, its highest valued scagnostic, and its respective value
calc_scags calc_scags_wide top_scags
#an example using calc_scags require(dplyr) datasaurus_dozen %>% group_by(dataset) %>% summarise(calc_scags(x,y, scags=c("monotonic", "outlying", "convex"))) %>% top_pairs() #an example using calc_scags_wide data(pk) scags_data <- calc_scags_wide(pk[,2:5], scags=c("outlying","clumpy","monotonic")) top_pairs(scags_data)
#an example using calc_scags require(dplyr) datasaurus_dozen %>% group_by(dataset) %>% summarise(calc_scags(x,y, scags=c("monotonic", "outlying", "convex"))) %>% top_pairs() #an example using calc_scags_wide data(pk) scags_data <- calc_scags_wide(pk[,2:5], scags=c("outlying","clumpy","monotonic")) top_pairs(scags_data)
Calculate the top pair of variables or group for each scagnostic
top_scags(scags_data)
top_scags(scags_data)
scags_data |
A dataset of scagnostic values that was returned by calc_scags or calc_scags_wide |
A data frame where each row is a scagnostic with its highest pair and the associated value
calc_scags calc_scags_wide top_pairs
#an example using calc_scags require(dplyr) datasaurus_dozen %>% group_by(dataset) %>% summarise(calc_scags(x,y, scags=c("monotonic", "outlying", "convex"))) %>% top_scags() #an example using calc_scags_wide data(pk) scags_data <- calc_scags_wide(pk[,2:5], scags=c("outlying","clumpy","monotonic")) top_scags(scags_data)
#an example using calc_scags require(dplyr) datasaurus_dozen %>% group_by(dataset) %>% summarise(calc_scags(x,y, scags=c("monotonic", "outlying", "convex"))) %>% top_scags() #an example using calc_scags_wide data(pk) scags_data <- calc_scags_wide(pk[,2:5], scags=c("outlying","clumpy","monotonic")) top_scags(scags_data)