| Title: | 'ggplot2' Based Plots with Statistical Details |
|---|---|
| Description: | Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>. |
| Authors: | Indrajeet Patil [cre, aut, cph] (ORCID: <https://orcid.org/0000-0003-1995-6531>) |
| Maintainer: | Indrajeet Patil <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-02 06:32:56 UTC |
| Source: | https://github.com/indrajeetpatil/ggstatsplot |
Tidy version of the "Bugs" dataset.
bugs_longbugs_long
A data frame with 372 rows and 6 variables
subject. Dummy identity number for each participant.
gender. Participant's gender (Female, Male).
region. Region of the world the participant was from.
education. Level of education.
condition. Condition of the experiment the participant gave rating for (LDLF: low freighteningness and low disgustingness; LFHD: low freighteningness and high disgustingness; HFHD: high freighteningness and low disgustingness; HFHD: high freighteningness and high disgustingness).
desire. The desire to kill an arthropod was indicated on a scale from 0 to 10.
This data set, "Bugs", provides the extent to which men and women want to kill arthropods that vary in freighteningness (low, high) and disgustingness (low, high). Each participant rates their attitudes towards all anthropods. Subset of the data reported by Ryan et al. (2013).
Ryan, R. S., Wilde, M., & Crist, S. (2013). Compared to a small, supervised lab experiment, a large, unsupervised web-based experiment on a previously unknown effect has benefits that outweigh its potential costs. Computers in Human Behavior, 29(4), 1295-1301.
dim(bugs_long) head(bugs_long) dplyr::glimpse(bugs_long)dim(bugs_long) head(bugs_long) dplyr::glimpse(bugs_long)
Wrapper around patchwork::wrap_plots() that will return a combined grid
of plots with annotations. In case you want to create a grid of plots, it is
highly recommended that you use {patchwork} package directly and not
this wrapper around it which is mostly useful with {ggstatsplot} plots. It
is exported only for backward compatibility.
combine_plots( plotlist, plotgrid.args = list(), annotation.args = list(), guides = "collect", ... )combine_plots( plotlist, plotgrid.args = list(), annotation.args = list(), guides = "collect", ... )
plotlist |
A list containing |
plotgrid.args |
A |
annotation.args |
A |
guides |
A string specifying how guides should be treated in the layout.
|
... |
Currently ignored. |
A combined plot with annotation labels.
library(ggplot2) # first plot p1 <- ggplot( data = subset(iris, iris$Species == "setosa"), aes(x = Sepal.Length, y = Sepal.Width) ) + geom_point() + labs(title = "setosa") # second plot p2 <- ggplot( data = subset(iris, iris$Species == "versicolor"), aes(x = Sepal.Length, y = Sepal.Width) ) + geom_point() + labs(title = "versicolor") # combining the plot with a title and a caption combine_plots( plotlist = list(p1, p2), plotgrid.args = list(nrow = 1), annotation.args = list( tag_levels = "a", title = "Dataset: Iris Flower dataset", subtitle = "Edgar Anderson collected this data", caption = "Note: Only two species of flower are displayed", theme = theme( plot.subtitle = element_text(size = 20), plot.title = element_text(size = 30) ) ) )library(ggplot2) # first plot p1 <- ggplot( data = subset(iris, iris$Species == "setosa"), aes(x = Sepal.Length, y = Sepal.Width) ) + geom_point() + labs(title = "setosa") # second plot p2 <- ggplot( data = subset(iris, iris$Species == "versicolor"), aes(x = Sepal.Length, y = Sepal.Width) ) + geom_point() + labs(title = "versicolor") # combining the plot with a title and a caption combine_plots( plotlist = list(p1, p2), plotgrid.args = list(nrow = 1), annotation.args = list( tag_levels = "a", title = "Dataset: Iris Flower dataset", subtitle = "Edgar Anderson collected this data", caption = "Note: Only two species of flower are displayed", theme = theme( plot.subtitle = element_text(size = 20), plot.title = element_text(size = 30) ) ) )
{ggstatsplot} plotsExtracting data frames or expressions from {ggstatsplot} plots
extract_stats(p) extract_subtitle(p) extract_caption(p)extract_stats(p) extract_subtitle(p) extract_caption(p)
p |
A plot from |
These are convenience functions to extract data frames or expressions with
statistical details that are used to create expressions displayed in
{ggstatsplot} plots as subtitle, caption, etc. Note that all of this
analysis is carried out by the {statsExpressions}
package. And so if you
are using these functions only to extract data frames, you are better off
using that package.
The only exception is the ggcorrmat() function. But, if a data frame is
what you want, you shouldn't be using ggcorrmat() anyway. You can use
correlation::correlation() function which provides tidy data frames by
default.
A list of tibbles containing summaries of various statistical analyses. The exact details included will depend on the function.
set.seed(123) # non-grouped plot p1 <- ggbetweenstats(mtcars, cyl, mpg) # grouped plot p2 <- grouped_ggbarstats(Titanic_full, Survived, Sex, grouping.var = Age) # extracting expressions ----------------------------- extract_subtitle(p1) extract_caption(p1) extract_subtitle(p2) extract_caption(p2) # extracting data frames ----------------------------- extract_stats(p1) extract_stats(p2)set.seed(123) # non-grouped plot p1 <- ggbetweenstats(mtcars, cyl, mpg) # grouped plot p2 <- grouped_ggbarstats(Titanic_full, Survived, Sex, grouping.var = Age) # extracting expressions ----------------------------- extract_subtitle(p1) extract_caption(p1) extract_subtitle(p2) extract_caption(p2) # extracting data frames ----------------------------- extract_stats(p1) extract_stats(p2)
Bar charts for categorical data with statistical details included in the plot as a subtitle.
ggbarstats( data, x, y = NULL, counts = NULL, type = "parametric", paired = FALSE, results.subtitle = TRUE, label = "percentage", label.args = list(alpha = 1, fill = "white"), sample.size.label.args = list(size = 4), digits = 2L, proportion.test = results.subtitle, digits.perc = 0L, bf.message = TRUE, ratio = NULL, alternative = "two.sided", conf.level = 0.95, p.adjust.method = "holm", title = NULL, subtitle = NULL, caption = NULL, legend.title = NULL, xlab = NULL, ylab = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )ggbarstats( data, x, y = NULL, counts = NULL, type = "parametric", paired = FALSE, results.subtitle = TRUE, label = "percentage", label.args = list(alpha = 1, fill = "white"), sample.size.label.args = list(size = 4), digits = 2L, proportion.test = results.subtitle, digits.perc = 0L, bf.message = TRUE, ratio = NULL, alternative = "two.sided", conf.level = 0.95, p.adjust.method = "holm", title = NULL, subtitle = NULL, caption = NULL, legend.title = NULL, xlab = NULL, ylab = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The variable to use as the rows in the contingency table. Please note that if there are empty factor levels in your variable, they will be dropped. |
y |
The variable to use as the columns in the contingency table.
Please note that if there are empty factor levels in your variable, they
will be dropped. Default is |
counts |
The variable in data containing counts, or |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
paired |
Logical indicating whether data came from a within-subjects or
repeated measures design study (Default: |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
label |
Character decides what information needs to be displayed
on the label in each pie slice. Possible options are |
label.args |
Additional aesthetic arguments that will be passed to
|
sample.size.label.args |
Additional aesthetic arguments that will be
passed to |
digits |
Number of digits for rounding or significant figures. May also
be |
proportion.test |
Decides whether proportion test for |
digits.perc |
Numeric that decides number of decimal places for
percentage labels (Default: |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
ratio |
A vector of proportions: the expected proportions for the
proportion test (should sum to |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
conf.level |
Scalar between |
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
caption |
The text for the plot caption. This argument is relevant only
if |
legend.title |
Title text for the legend. |
xlab |
Label for |
ylab |
Labels for |
ggtheme |
A |
palette |
Name of the palette in |
ggplot.component |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggpiestats.html
| graphical element | geom used |
argument for further modification |
| bars | ggplot2::geom_bar() |
NA |
| descriptive labels | ggplot2::geom_label() |
label.args |
| sample size labels | ggplot2::geom_text() |
sample.size.label.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | Design | Test | Function used |
| Parametric/Non-parametric | Unpaired | Pearson's chi-squared test | stats::chisq.test() |
| Bayesian | Unpaired | Bayesian Pearson's chi-squared test | BayesFactor::contingencyTableBF() |
| Parametric/Non-parametric | Paired | McNemar's chi-squared test | stats::mcnemar.test() |
| Bayesian | Paired | No | No |
Effect size estimation
| Type | Design | Effect size | CI available? | Function used |
| Parametric/Non-parametric | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
| Bayesian | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
| Parametric/Non-parametric | Paired | Cohen's g | Yes | effectsize::cohens_g() |
| Bayesian | Paired | No | No | No |
Hypothesis testing
| Type | Test | Function used |
| Parametric/Non-parametric | Goodness of fit chi-squared test | stats::chisq.test() |
| Bayesian | Bayesian Goodness of fit chi-squared test | (custom) |
Effect size estimation
| Type | Effect size | CI available? | Function used |
| Parametric/Non-parametric | Pearson's C | Yes | effectsize::pearsons_c() |
| Bayesian | No | No | No |
When there is a two-way table and x has more than two levels, pairwise
contingency table analyses (Fisher's exact tests) are computed using
statsExpressions::pairwise_contingency_table(). These pairwise results are not
displayed in the plot because bar and pie charts lack a natural visual
representation for pairwise significance annotations (unlike box/violin
plots, which use bracket annotations). Additionally, there is no
established convention for overlaying pairwise comparisons on pie charts,
and both ggpiestats() and ggbarstats() are designed to remain visually
congruent. The pairwise results are available as a data frame via
extract_stats(plot)$pairwise_comparisons_data.
grouped_ggbarstats, ggpiestats,
grouped_ggpiestats
# for reproducibility set.seed(123) # one sample goodness of fit proportion test p <- ggbarstats(mtcars, vs) # looking at the plot p # extracting details from statistical tests extract_stats(p) # association test (or contingency table analysis) ggbarstats(mtcars, vs, cyl) # with 3+ x levels, pairwise comparisons are available ggbarstats(mtcars, cyl, am) # Bayesian test ggbarstats(mtcars, vs, cyl, type = "bayes") # using pre-aggregated data with counts ggbarstats(as.data.frame(Titanic), x = Survived, y = Sex, counts = Freq)# for reproducibility set.seed(123) # one sample goodness of fit proportion test p <- ggbarstats(mtcars, vs) # looking at the plot p # extracting details from statistical tests extract_stats(p) # association test (or contingency table analysis) ggbarstats(mtcars, vs, cyl) # with 3+ x levels, pairwise comparisons are available ggbarstats(mtcars, cyl, am) # Bayesian test ggbarstats(mtcars, vs, cyl, type = "bayes") # using pre-aggregated data with counts ggbarstats(as.data.frame(Titanic), x = Survived, y = Sex, counts = Freq)
A combination of box and violin plots along with jittered data points for between-subjects designs with statistical details included in the plot as a subtitle.
ggbetweenstats( data, x, y, type = "parametric", pairwise.display = "significant", pairwise.alpha = 0.05, p.adjust.method = "holm", bf.prior = 0.707, bf.message = TRUE, results.subtitle = TRUE, xlab = NULL, ylab = NULL, caption = NULL, title = NULL, subtitle = NULL, digits = 2L, conf.level = 0.95, tr = 0.2, alternative = "two.sided", centrality.plotting = TRUE, centrality.type = type, centrality.point.args = list(size = 5, color = "darkred"), centrality.label.args = list(size = 3, nudge_x = 0.4, segment.linetype = 4, min.segment.length = 0), point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6), alpha = 0.4, size = 3, stroke = 0, na.rm = TRUE), boxplot.args = list(width = 0.3, alpha = 0.2, na.rm = TRUE), violin.args = list(width = 0.5, alpha = 0.2, na.rm = TRUE), ggsignif.args = list(textsize = 3, tip_length = 0.01, na.rm = TRUE), ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )ggbetweenstats( data, x, y, type = "parametric", pairwise.display = "significant", pairwise.alpha = 0.05, p.adjust.method = "holm", bf.prior = 0.707, bf.message = TRUE, results.subtitle = TRUE, xlab = NULL, ylab = NULL, caption = NULL, title = NULL, subtitle = NULL, digits = 2L, conf.level = 0.95, tr = 0.2, alternative = "two.sided", centrality.plotting = TRUE, centrality.type = type, centrality.point.args = list(size = 5, color = "darkred"), centrality.label.args = list(size = 3, nudge_x = 0.4, segment.linetype = 4, min.segment.length = 0), point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6), alpha = 0.4, size = 3, stroke = 0, na.rm = TRUE), boxplot.args = list(width = 0.3, alpha = 0.2, na.rm = TRUE), violin.args = list(width = 0.5, alpha = 0.2, na.rm = TRUE), ggsignif.args = list(textsize = 3, tip_length = 0.01, na.rm = TRUE), ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable from |
y |
The response (or outcome or dependent) variable from |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
pairwise.display |
Decides which pairwise comparisons to display. Available options are:
You can use this argument to make sure that your plot is not uber-cluttered
when you have multiple groups being compared and scores of pairwise
comparisons being displayed. If set to |
pairwise.alpha |
Numeric alpha threshold used to decide which pairwise
comparisons are displayed when |
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
bf.prior |
A number between |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
xlab |
Label for |
ylab |
Labels for |
caption |
The text for the plot caption. This argument is relevant only
if |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
centrality.plotting |
Logical that decides whether centrality tendency
measure is to be displayed as a point with a label (Default:
If you want default centrality parameter, you can specify this using
|
centrality.type |
Decides which centrality parameter is to be displayed.
The default is to choose the same as
Just as |
centrality.point.args, centrality.label.args
|
A list of additional aesthetic
arguments to be passed to |
point.args |
A list of additional aesthetic arguments to be passed to
the |
boxplot.args |
A list of additional aesthetic arguments passed on to
|
violin.args |
A list of additional aesthetic arguments to be passed to
the |
ggsignif.args |
A list of additional aesthetic
arguments to be passed to |
ggtheme |
A |
palette |
Name of the palette in |
ggplot.component |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggbetweenstats.html
| graphical element | geom used |
argument for further modification |
| raw data | ggplot2::geom_point() |
point.args |
| box plot | ggplot2::geom_boxplot() |
boxplot.args |
| density plot | ggplot2::geom_violin() |
violin.args |
| centrality measure point | ggplot2::geom_point() |
centrality.point.args |
| centrality measure label | ggrepel::geom_label_repel() |
centrality.label.args |
| pairwise comparisons | ggsignif::geom_signif() |
ggsignif.args
|
This function uses statistically justified defaults that are not user-configurable:
Effect sizes are always unbiased (Hedges' g instead of Cohen's d, omega-squared instead of eta-squared). Unbiased estimators correct for the positive bias present in their biased counterparts, especially in small samples, and are recommended for meta-analytic work.
Welch's t-test and one-way test are used instead of Student's versions (i.e., equal variances are not assumed). Welch's test performs as well as Student's when variances are equal and is substantially more accurate when they are not, making it the unconditionally better default.
Users who need non-default values for these settings can call
{statsExpressions} directly.
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
| Type | Measure | Function used |
| Parametric | mean | datawizard::describe_distribution() |
| Non-parametric | median | datawizard::describe_distribution() |
| Robust | trimmed mean | datawizard::describe_distribution() |
| Bayesian | MAP | datawizard::describe_distribution()
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | 2 | Student's or Welch's t-test | stats::t.test() |
| Non-parametric | 2 | Mann-Whitney U test | stats::wilcox.test() |
| Robust | 2 | Yuen's test for trimmed means | WRS2::yuen() |
| Bayesian | 2 | Student's t-test | BayesFactor::ttestBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | 2 | Cohen's d, Hedge's g | Yes | effectsize::cohens_d(), effectsize::hedges_g() |
| Non-parametric | 2 | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
| Robust | 2 | Algina-Keselman-Penfield robust standardized difference | Yes | WRS2::akp.effect() |
| Bayesian | 2 | difference | Yes | bayestestR::describe_posterior()
|
Data requirement: Paired tests assume exactly one observation per subject per condition. If your data has multiple trials per cell, aggregate first (e.g., take the mean).
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | 2 | Student's t-test | stats::t.test() |
| Non-parametric | 2 | Wilcoxon signed-rank test | stats::wilcox.test() |
| Robust | 2 | Yuen's test on trimmed means for dependent samples | WRS2::yuend() |
| Bayesian | 2 | Student's t-test | BayesFactor::ttestBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | 2 | Cohen's d, Hedge's g | Yes | effectsize::cohens_d(), effectsize::hedges_g() |
| Non-parametric | 2 | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
| Robust | 2 | Algina-Keselman-Penfield robust standardized difference | Yes | WRS2::wmcpAKP() |
| Bayesian | 2 | difference | Yes | bayestestR::describe_posterior()
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | > 2 | Fisher's or Welch's one-way ANOVA | stats::oneway.test() |
| Non-parametric | > 2 | Kruskal-Wallis one-way ANOVA | stats::kruskal.test() |
| Robust | > 2 | Heteroscedastic one-way ANOVA for trimmed means | WRS2::t1way() |
| Bayesian | > 2 | Fisher's ANOVA | BayesFactor::anovaBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | > 2 | partial eta-squared, partial omega-squared | Yes | effectsize::omega_squared(), effectsize::eta_squared() |
| Non-parametric | > 2 | rank epsilon squared | Yes | effectsize::rank_epsilon_squared() |
| Robust | > 2 | Explanatory measure of effect size | Yes | WRS2::t1way() |
| Bayesian | > 2 | Bayesian R-squared | Yes | performance::r2_bayes()
|
Data requirement: Repeated measures tests assume a complete design with
exactly one observation per subject per condition. If your data has multiple
trials per cell, aggregate first (e.g., take the mean). Verify with
table(data$subject, data$condition) — every cell should equal 1.
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | > 2 | One-way repeated measures ANOVA | afex::aov_ez() |
| Non-parametric | > 2 | Friedman rank sum test | stats::friedman.test() |
| Robust | > 2 | Heteroscedastic one-way repeated measures ANOVA for trimmed means | WRS2::rmanova() |
| Bayesian | > 2 | One-way repeated measures ANOVA | BayesFactor::anovaBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | > 2 | partial eta-squared, partial omega-squared | Yes | effectsize::omega_squared(), effectsize::eta_squared() |
| Non-parametric | > 2 | Kendall's coefficient of concordance | Yes | effectsize::kendalls_w() |
| Robust | > 2 | Algina-Keselman-Penfield robust standardized difference average | Yes | WRS2::wmcpAKP() |
| Bayesian | > 2 | Bayesian R-squared | Yes | performance::r2_bayes()
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | Equal variance? | Test | p-value adjustment? | Function used |
| Parametric | No | Games-Howell test | Yes | PMCMRplus::gamesHowellTest() |
| Parametric | Yes | Student's t-test | Yes | stats::pairwise.t.test() |
| Non-parametric | No | Dunn test | Yes | PMCMRplus::kwAllPairsDunnTest() |
| Robust | No | Yuen's trimmed means test | Yes | WRS2::lincon() |
| Bayesian | NA |
Student's t-test | NA |
BayesFactor::ttestBF()
|
Effect size estimation
Not supported.
Data requirement: Paired pairwise tests assume exactly one observation per subject per condition. If your data has multiple trials per cell, aggregate first (e.g., take the mean).
Hypothesis testing
| Type | Test | p-value adjustment? | Function used |
| Parametric | Student's t-test | Yes | stats::pairwise.t.test() |
| Non-parametric | Durbin-Conover test | Yes | PMCMRplus::durbinAllPairsTest() |
| Robust | Yuen's trimmed means test | Yes | WRS2::rmmcp() |
| Bayesian | Student's t-test | NA |
BayesFactor::ttestBF()
|
Effect size estimation
Not supported.
grouped_ggbetweenstats, ggwithinstats,
grouped_ggwithinstats
# for reproducibility set.seed(123) p <- ggbetweenstats(mtcars, am, mpg) p # extracting details from statistical tests extract_stats(p) # show non-significant pairwise comparisons (needs 3+ groups for ggsignif) ggbetweenstats(mtcars, cyl, mpg, pairwise.display = "non-significant") # show all pairwise comparisons ggbetweenstats(mtcars, cyl, mpg, pairwise.display = "all") # use a stricter alpha threshold for significant pairwise comparisons ggbetweenstats(mtcars, cyl, mpg, pairwise.alpha = 0.001) # modifying defaults ggbetweenstats( morley, x = Expt, y = Speed, type = "robust", xlab = "The experiment number", ylab = "Speed-of-light measurement" ) # you can remove a specific geom to reduce complexity of the plot ggbetweenstats( mtcars, am, wt, # to remove violin plot violin.args = list(width = 0, linewidth = 0, colour = NA), # to remove boxplot boxplot.args = list(width = 0), # to remove points point.args = list(alpha = 0) )# for reproducibility set.seed(123) p <- ggbetweenstats(mtcars, am, mpg) p # extracting details from statistical tests extract_stats(p) # show non-significant pairwise comparisons (needs 3+ groups for ggsignif) ggbetweenstats(mtcars, cyl, mpg, pairwise.display = "non-significant") # show all pairwise comparisons ggbetweenstats(mtcars, cyl, mpg, pairwise.display = "all") # use a stricter alpha threshold for significant pairwise comparisons ggbetweenstats(mtcars, cyl, mpg, pairwise.alpha = 0.001) # modifying defaults ggbetweenstats( morley, x = Expt, y = Speed, type = "robust", xlab = "The experiment number", ylab = "Speed-of-light measurement" ) # you can remove a specific geom to reduce complexity of the plot ggbetweenstats( mtcars, am, wt, # to remove violin plot violin.args = list(width = 0, linewidth = 0, colour = NA), # to remove boxplot boxplot.args = list(width = 0), # to remove points point.args = list(alpha = 0) )
Plot with the regression coefficients' point estimates as dots with confidence interval whiskers and other statistical details included as labels.
Although the statistical models displayed in the plot may differ based on the class of models being investigated, there are few aspects of the plot that will be invariant across models:
The dot-whisker plot contains a dot representing the estimate and their
confidence intervals (95% is the default). The estimate can either be
effect sizes (for tests that depend on the F-statistic) or regression
coefficients (for tests with t-, chi^2-, and z-statistic), etc. The
function will, by default, display a helpful x-axis label that should
clear up what estimates are being displayed. The confidence intervals can
sometimes be asymmetric if bootstrapping was used.
The label attached to dot will provide more details from the statistical test carried out and it will typically contain estimate, statistic, and p-value.
The caption will contain diagnostic information, if available, about models that can be useful for model selection: The smaller the Akaike's Information Criterion (AIC) and the Bayesian Information Criterion (BIC) values, the "better" the model is.
The output of this function will be a {ggplot2} object and, thus,
it can be further modified (e.g. change themes) with {ggplot2}.
ggcoefstats( x, statistic = NULL, conf.int = TRUE, conf.level = 0.95, digits = 2L, exclude.intercept = FALSE, effectsize.type = "omega", meta.analytic.effect = FALSE, meta.type = "parametric", bf.message = TRUE, sort = "none", xlab = NULL, ylab = NULL, title = NULL, subtitle = NULL, caption = NULL, only.significant = FALSE, point.args = list(size = 3, color = "blue", na.rm = TRUE), errorbar.args = list(width = 0, na.rm = TRUE), vline = TRUE, vline.args = list(linewidth = 1, linetype = "dashed"), stats.labels = TRUE, stats.label.color = NULL, stats.label.args = list(size = 3, direction = "y", min.segment.length = 0, na.rm = TRUE), palette = "ggthemes::gdoc", ggtheme = ggstatsplot::theme_ggstatsplot(), ... )ggcoefstats( x, statistic = NULL, conf.int = TRUE, conf.level = 0.95, digits = 2L, exclude.intercept = FALSE, effectsize.type = "omega", meta.analytic.effect = FALSE, meta.type = "parametric", bf.message = TRUE, sort = "none", xlab = NULL, ylab = NULL, title = NULL, subtitle = NULL, caption = NULL, only.significant = FALSE, point.args = list(size = 3, color = "blue", na.rm = TRUE), errorbar.args = list(width = 0, na.rm = TRUE), vline = TRUE, vline.args = list(linewidth = 1, linetype = "dashed"), stats.labels = TRUE, stats.label.color = NULL, stats.label.args = list(size = 3, direction = "y", min.segment.length = 0, na.rm = TRUE), palette = "ggthemes::gdoc", ggtheme = ggstatsplot::theme_ggstatsplot(), ... )
x |
A model object to be tidied, or a tidy data frame from a regression
model. Function internally uses |
statistic |
Relevant statistic for the model ( |
conf.int |
Logical. Decides whether to display confidence intervals as
error bars (Default: |
conf.level |
Numeric deciding level of confidence or credible intervals
(Default: |
digits |
Number of digits for rounding or significant figures. May also
be |
exclude.intercept |
Logical that decides whether the intercept should be
excluded from the plot (Default: |
effectsize.type |
This is the same as |
meta.analytic.effect |
Logical that decides whether subtitle for
meta-analysis via linear (mixed-effects) models (default: |
meta.type |
Type of statistics used to carry out random-effects
meta-analysis. If |
bf.message |
Logical that decides whether results from running a
Bayesian meta-analysis assuming that the effect size d varies across
studies with standard deviation t (i.e., a random-effects analysis)
should be displayed in caption. Defaults to |
sort |
If |
xlab |
Label for |
ylab |
Labels for |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. The input to this argument
will be ignored if |
caption |
The text for the plot caption. This argument is relevant only
if |
only.significant |
If |
point.args |
A list of additional aesthetic arguments to be passed to
the |
errorbar.args |
Additional arguments that will be passed to
|
vline |
Decides whether to display a vertical line (Default: |
vline.args |
Additional arguments that will be passed to
|
stats.labels |
Logical. Decides whether the statistic and p-values for
each coefficient are to be attached to each dot as a text label using
|
stats.label.color |
Color for the labels. If set to |
stats.label.args |
Additional arguments that will be passed to
|
palette |
Name of the palette in |
ggtheme |
A |
... |
Additional arguments to tidying method. For more, see
|
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggcoefstats.html
| graphical element | geom used |
argument for further modification |
| regression estimate | ggplot2::geom_point() |
point.args |
| error bars | ggplot2::geom_errorbarh() |
errorbar.args |
| vertical line | ggplot2::geom_vline() |
vline.args |
| label with statistical details | ggrepel::geom_label_repel() |
stats.label.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing and Effect size estimation
| Type | Test | CI available? | Function used |
| Parametric | Pearson's correlation coefficient | Yes | correlation::correlation() |
| Non-parametric | Spearman's rank correlation coefficient | Yes | correlation::correlation() |
| Robust | Winsorized Pearson's correlation coefficient | Yes | correlation::correlation() |
| Bayesian | Bayesian Pearson's correlation coefficient | Yes | correlation::correlation()
|
In case you want to carry out meta-analysis, you will be asked to install
the needed packages ({metafor}, {metaplus}, or {metaBMA}) if they are
unavailable.
All rows of regression estimates where either of the following
quantities is NA will be removed if labels are requested:
estimate, statistic, p.value.
Given the rapid pace at which new methods are added to these packages, it
is recommended that you install development versions of {easystats}
packages using the install_latest() function from {easystats}.
# for reproducibility set.seed(123) # model object mod <- lm(formula = mpg ~ cyl * am, data = mtcars) # creating a plot p <- ggcoefstats(mod) # looking at the plot p # extracting details from statistical tests extract_stats(p) # exclude intercept from the plot ggcoefstats(mod, exclude.intercept = TRUE) # only show significant labels ggcoefstats(mod, only.significant = TRUE) # ANOVA model (F-statistic) ggcoefstats(aov(mpg ~ cyl * am, data = mtcars)) # a tidy data frame can also be passed directly (model-free use) ggcoefstats(data.frame(term = c("a", "b", "c"), estimate = c(0.5, -0.2, 1.1))) # without a `term` column (auto-generated) ggcoefstats(data.frame(estimate = c(0.5, -0.2, 1.1))) # tidy data frames can also include stats-label inputs directly df_tidy <- parameters::model_parameters(stats::lm(wt ~ am * cyl, mtcars), ci = 0.95) names(df_tidy) <- c( "term", "estimate", "std.error", "conf.level", "conf.low", "conf.high", "statistic", "df.error", "p.value" ) df_tidy$p.value[2L] <- 0.42 ggcoefstats( df_tidy, statistic = "t", only.significant = TRUE, stats.label.color = c("firebrick", "grey50", "forestgreen", "navy") ) # further arguments can be passed to `parameters::model_parameters()` library(lme4) ggcoefstats(lmer(Reaction ~ Days + (Days | Subject), sleepstudy), effects = "fixed")# for reproducibility set.seed(123) # model object mod <- lm(formula = mpg ~ cyl * am, data = mtcars) # creating a plot p <- ggcoefstats(mod) # looking at the plot p # extracting details from statistical tests extract_stats(p) # exclude intercept from the plot ggcoefstats(mod, exclude.intercept = TRUE) # only show significant labels ggcoefstats(mod, only.significant = TRUE) # ANOVA model (F-statistic) ggcoefstats(aov(mpg ~ cyl * am, data = mtcars)) # a tidy data frame can also be passed directly (model-free use) ggcoefstats(data.frame(term = c("a", "b", "c"), estimate = c(0.5, -0.2, 1.1))) # without a `term` column (auto-generated) ggcoefstats(data.frame(estimate = c(0.5, -0.2, 1.1))) # tidy data frames can also include stats-label inputs directly df_tidy <- parameters::model_parameters(stats::lm(wt ~ am * cyl, mtcars), ci = 0.95) names(df_tidy) <- c( "term", "estimate", "std.error", "conf.level", "conf.low", "conf.high", "statistic", "df.error", "p.value" ) df_tidy$p.value[2L] <- 0.42 ggcoefstats( df_tidy, statistic = "t", only.significant = TRUE, stats.label.color = c("firebrick", "grey50", "forestgreen", "navy") ) # further arguments can be passed to `parameters::model_parameters()` library(lme4) ggcoefstats(lmer(Reaction ~ Days + (Days | Subject), sleepstudy), effects = "fixed")
Correlation matrix containing results from pairwise correlation tests.
If you want a data frame of (grouped) correlation matrix, use
correlation::correlation() instead. It can also do grouped analysis when
used with output from dplyr::group_by().
ggcorrmat( data, cor.vars = NULL, cor.vars.names = NULL, matrix.type = "upper", type = "parametric", tr = 0.2, partial = FALSE, digits = 2L, sig.level = 0.05, conf.level = 0.95, bf.prior = 0.707, p.adjust.method = "holm", colors = c("#EA4335", "white", "#4285F4"), pch = "cross", ggcorrplot.args = list(method = "square", outline.color = "black", pch.cex = 14), ggtheme = ggstatsplot::theme_ggstatsplot(), ggplot.component = NULL, title = NULL, subtitle = NULL, caption = NULL, ... )ggcorrmat( data, cor.vars = NULL, cor.vars.names = NULL, matrix.type = "upper", type = "parametric", tr = 0.2, partial = FALSE, digits = 2L, sig.level = 0.05, conf.level = 0.95, bf.prior = 0.707, p.adjust.method = "holm", colors = c("#EA4335", "white", "#4285F4"), pch = "cross", ggcorrplot.args = list(method = "square", outline.color = "black", pch.cex = 14), ggtheme = ggstatsplot::theme_ggstatsplot(), ggplot.component = NULL, title = NULL, subtitle = NULL, caption = NULL, ... )
data |
A data frame from which variables specified are to be taken. |
cor.vars |
List of variables for which the correlation matrix is to be
computed and visualized. If |
cor.vars.names |
Optional list of names to be used for |
matrix.type |
Character, |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
tr |
Trim level for the mean when carrying out |
partial |
Can be |
digits |
Number of digits for rounding or significant figures. May also
be |
sig.level |
Significance level (Default: |
conf.level |
Scalar between |
bf.prior |
A number between |
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
colors |
A character vector of exactly three colors for the gradient:
low (negative correlations), mid (zero), and high (positive correlations).
Must be a diverging palette so that the sign of the correlation is
visually obvious.
Default: |
pch |
Decides the point shape to be used for insignificant correlation
coefficients (only valid when |
ggcorrplot.args |
A list of additional (mostly aesthetic) arguments that
will be passed to |
ggtheme |
A |
ggplot.component |
A |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
caption |
The text for the plot caption. This argument is relevant only
if |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggcorrmat.html
| graphical element | geom used |
argument for further modification |
| correlation matrix | ggcorrplot::ggcorrplot() |
ggcorrplot.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing and Effect size estimation
| Type | Test | CI available? | Function used |
| Parametric | Pearson's correlation coefficient | Yes | correlation::correlation() |
| Non-parametric | Spearman's rank correlation coefficient | Yes | correlation::correlation() |
| Robust | Winsorized Pearson's correlation coefficient | Yes | correlation::correlation() |
| Bayesian | Bayesian Pearson's correlation coefficient | Yes | correlation::correlation()
|
grouped_ggcorrmat ggscatterstats
grouped_ggscatterstats
set.seed(123) library(ggcorrplot) ggcorrmat(iris) # with data containing NAs (uses pairwise complete observations) ggcorrmat(airquality) # selecting specific variables ggcorrmat(iris, cor.vars = c(Sepal.Length, Petal.Length, Petal.Width))set.seed(123) library(ggcorrplot) ggcorrmat(iris) # with data containing NAs (uses pairwise complete observations) ggcorrmat(airquality) # selecting specific variables ggcorrmat(iris, cor.vars = c(Sepal.Length, Petal.Length, Petal.Width))
A dot chart (as described by William S. Cleveland) with statistical details from one-sample test.
The point estimate (and associated uncertainty) displayed depends on the type of statistics selected:
mean for parametric statistics
median for non-parametric statistics
trimmed mean for robust statistics
MAP estimator for Bayesian statistics
ggdotplotstats( data, x, y, xlab = NULL, ylab = NULL, title = NULL, subtitle = NULL, caption = NULL, type = "parametric", test.value = 0, alternative = "two.sided", bf.prior = 0.707, bf.message = TRUE, conf.int = TRUE, conf.level = 0.95, tr = 0.2, digits = 2L, results.subtitle = TRUE, point.args = list(color = "black", size = 3, shape = 16), errorbar.args = list(width = 0, na.rm = TRUE), centrality.plotting = TRUE, centrality.type = type, centrality.line.args = list(color = "blue", linewidth = 1, linetype = "dashed"), ggplot.component = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), ... )ggdotplotstats( data, x, y, xlab = NULL, ylab = NULL, title = NULL, subtitle = NULL, caption = NULL, type = "parametric", test.value = 0, alternative = "two.sided", bf.prior = 0.707, bf.message = TRUE, conf.int = TRUE, conf.level = 0.95, tr = 0.2, digits = 2L, results.subtitle = TRUE, point.args = list(color = "black", size = 3, shape = 16), errorbar.args = list(width = 0, na.rm = TRUE), centrality.plotting = TRUE, centrality.type = type, centrality.line.args = list(color = "blue", linewidth = 1, linetype = "dashed"), ggplot.component = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
A numeric variable from the data frame |
y |
Label or grouping variable. |
xlab |
Label for |
ylab |
Labels for |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
caption |
The text for the plot caption. This argument is relevant only
if |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
test.value |
A number indicating the true value of the mean (Default:
|
alternative |
a character string specifying the alternative
hypothesis, must be one of |
bf.prior |
A number between |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
conf.int |
Logical. Decides whether to display confidence intervals as
error bars (Default: |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
digits |
Number of digits for rounding or significant figures. May also
be |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
point.args |
A list of additional aesthetic arguments to be passed to
the |
errorbar.args |
Additional arguments that will be passed to
|
centrality.plotting |
Logical that decides whether centrality tendency
measure is to be displayed as a point with a label (Default:
If you want default centrality parameter, you can specify this using
|
centrality.type |
Decides which centrality parameter is to be displayed.
The default is to choose the same as
Just as |
centrality.line.args |
A list of additional aesthetic arguments to be
passed to the |
ggplot.component |
A |
ggtheme |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggdotplotstats.html
| graphical element | geom used |
argument for further modification |
| raw data | ggplot2::geom_point() |
point.args |
| error bars | ggplot2::geom_errorbarh() |
errorbar.args |
| centrality measure line | ggplot2::geom_vline() |
centrality.line.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | Test | Function used |
| Parametric | One-sample Student's t-test | stats::t.test() |
| Non-parametric | One-sample Wilcoxon test | stats::wilcox.test() |
| Robust | Bootstrap-t method for one-sample test | WRS2::trimcibt() |
| Bayesian | One-sample Student's t-test | BayesFactor::ttestBF()
|
Effect size estimation
| Type | Effect size | CI available? | Function used |
| Parametric | Cohen's d, Hedge's g | Yes | effectsize::cohens_d(), effectsize::hedges_g() |
| Non-parametric | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
| Robust | trimmed mean | Yes | WRS2::trimcibt() |
| Bayesian | difference | Yes | bayestestR::describe_posterior()
|
grouped_gghistostats, gghistostats,
grouped_ggdotplotstats
# for reproducibility set.seed(123) # creating a plot p <- ggdotplotstats( data = ggplot2::mpg, x = cty, y = manufacturer, title = "Fuel economy data", xlab = "city miles per gallon" ) # looking at the plot p # extracting details from statistical tests extract_stats(p)# for reproducibility set.seed(123) # creating a plot p <- ggdotplotstats( data = ggplot2::mpg, x = cty, y = manufacturer, title = "Fuel economy data", xlab = "city miles per gallon" ) # looking at the plot p # extracting details from statistical tests extract_stats(p)
Histogram with statistical details from one-sample test included in the plot as a subtitle.
gghistostats( data, x, binwidth = NULL, xlab = NULL, title = NULL, subtitle = NULL, caption = NULL, type = "parametric", test.value = 0, alternative = "two.sided", bf.prior = 0.707, bf.message = TRUE, conf.level = 0.95, tr = 0.2, digits = 2L, ggtheme = ggstatsplot::theme_ggstatsplot(), results.subtitle = TRUE, bin.args = list(color = "black", fill = "grey50", alpha = 0.7), centrality.plotting = TRUE, centrality.type = type, centrality.line.args = list(color = "blue", linewidth = 1, linetype = "dashed"), ggplot.component = NULL, ... )gghistostats( data, x, binwidth = NULL, xlab = NULL, title = NULL, subtitle = NULL, caption = NULL, type = "parametric", test.value = 0, alternative = "two.sided", bf.prior = 0.707, bf.message = TRUE, conf.level = 0.95, tr = 0.2, digits = 2L, ggtheme = ggstatsplot::theme_ggstatsplot(), results.subtitle = TRUE, bin.args = list(color = "black", fill = "grey50", alpha = 0.7), centrality.plotting = TRUE, centrality.type = type, centrality.line.args = list(color = "blue", linewidth = 1, linetype = "dashed"), ggplot.component = NULL, ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
A numeric variable from the data frame |
binwidth |
The width of the histogram bins. Can be specified as a
numeric value, or a function that calculates width from |
xlab |
Label for |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
caption |
The text for the plot caption. This argument is relevant only
if |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
test.value |
A number indicating the true value of the mean (Default:
|
alternative |
a character string specifying the alternative
hypothesis, must be one of |
bf.prior |
A number between |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
digits |
Number of digits for rounding or significant figures. May also
be |
ggtheme |
A |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
bin.args |
A list of additional aesthetic arguments to be passed to the
|
centrality.plotting |
Logical that decides whether centrality tendency
measure is to be displayed as a point with a label (Default:
If you want default centrality parameter, you can specify this using
|
centrality.type |
Decides which centrality parameter is to be displayed.
The default is to choose the same as
Just as |
centrality.line.args |
A list of additional aesthetic arguments to be
passed to the |
ggplot.component |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/gghistostats.html
| graphical element | geom used |
argument for further modification |
| histogram bin | ggplot2::stat_bin() |
bin.args |
| centrality measure line | ggplot2::geom_vline() |
centrality.line.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | Test | Function used |
| Parametric | One-sample Student's t-test | stats::t.test() |
| Non-parametric | One-sample Wilcoxon test | stats::wilcox.test() |
| Robust | Bootstrap-t method for one-sample test | WRS2::trimcibt() |
| Bayesian | One-sample Student's t-test | BayesFactor::ttestBF()
|
Effect size estimation
| Type | Effect size | CI available? | Function used |
| Parametric | Cohen's d, Hedge's g | Yes | effectsize::cohens_d(), effectsize::hedges_g() |
| Non-parametric | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
| Robust | trimmed mean | Yes | WRS2::trimcibt() |
| Bayesian | difference | Yes | bayestestR::describe_posterior()
|
grouped_gghistostats, ggdotplotstats,
grouped_ggdotplotstats
# for reproducibility set.seed(123) # creating a plot p <- gghistostats( data = ToothGrowth, x = len, xlab = "Tooth length", centrality.type = "np" ) # looking at the plot p # extracting details from statistical tests extract_stats(p)# for reproducibility set.seed(123) # creating a plot p <- gghistostats( data = ToothGrowth, x = len, xlab = "Tooth length", centrality.type = "np" ) # looking at the plot p # extracting details from statistical tests extract_stats(p)
Pie charts for categorical data with statistical details included in the plot as a subtitle.
ggpiestats( data, x, y = NULL, counts = NULL, type = "parametric", paired = FALSE, results.subtitle = TRUE, label = "percentage", label.args = list(direction = "both"), label.repel = FALSE, digits = 2L, proportion.test = results.subtitle, digits.perc = 0L, bf.message = TRUE, ratio = NULL, alternative = "two.sided", conf.level = 0.95, p.adjust.method = "holm", title = NULL, subtitle = NULL, caption = NULL, legend.title = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )ggpiestats( data, x, y = NULL, counts = NULL, type = "parametric", paired = FALSE, results.subtitle = TRUE, label = "percentage", label.args = list(direction = "both"), label.repel = FALSE, digits = 2L, proportion.test = results.subtitle, digits.perc = 0L, bf.message = TRUE, ratio = NULL, alternative = "two.sided", conf.level = 0.95, p.adjust.method = "holm", title = NULL, subtitle = NULL, caption = NULL, legend.title = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The variable to use as the rows in the contingency table. Please note that if there are empty factor levels in your variable, they will be dropped. |
y |
The variable to use as the columns in the contingency table.
Please note that if there are empty factor levels in your variable, they
will be dropped. Default is |
counts |
The variable in data containing counts, or |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
paired |
Logical indicating whether data came from a within-subjects or
repeated measures design study (Default: |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
label |
Character decides what information needs to be displayed
on the label in each pie slice. Possible options are |
label.args |
Additional aesthetic arguments that will be passed to
|
label.repel |
Whether labels should be repelled using |
digits |
Number of digits for rounding or significant figures. May also
be |
proportion.test |
Decides whether proportion test for |
digits.perc |
Numeric that decides number of decimal places for
percentage labels (Default: |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
ratio |
A vector of proportions: the expected proportions for the
proportion test (should sum to |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
conf.level |
Scalar between |
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
caption |
The text for the plot caption. This argument is relevant only
if |
legend.title |
Title text for the legend. |
ggtheme |
A |
palette |
Name of the palette in |
ggplot.component |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggpiestats.html
| graphical element | geom used |
argument for further modification |
| pie slices | ggplot2::geom_col() |
NA |
| labels | ggplot2::geom_label()/ggrepel::geom_label_repel() |
label.args
|
When there is a two-way table and x has more than two levels, pairwise
contingency table analyses (Fisher's exact tests) are computed using
statsExpressions::pairwise_contingency_table(). These pairwise results are not
displayed in the plot because bar and pie charts lack a natural visual
representation for pairwise significance annotations (unlike box/violin
plots, which use bracket annotations). Additionally, there is no
established convention for overlaying pairwise comparisons on pie charts,
and both ggpiestats() and ggbarstats() are designed to remain visually
congruent. The pairwise results are available as a data frame via
extract_stats(plot)$pairwise_comparisons_data.
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | Design | Test | Function used |
| Parametric/Non-parametric | Unpaired | Pearson's chi-squared test | stats::chisq.test() |
| Bayesian | Unpaired | Bayesian Pearson's chi-squared test | BayesFactor::contingencyTableBF() |
| Parametric/Non-parametric | Paired | McNemar's chi-squared test | stats::mcnemar.test() |
| Bayesian | Paired | No | No |
Effect size estimation
| Type | Design | Effect size | CI available? | Function used |
| Parametric/Non-parametric | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
| Bayesian | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
| Parametric/Non-parametric | Paired | Cohen's g | Yes | effectsize::cohens_g() |
| Bayesian | Paired | No | No | No |
Hypothesis testing
| Type | Test | Function used |
| Parametric/Non-parametric | Goodness of fit chi-squared test | stats::chisq.test() |
| Bayesian | Bayesian Goodness of fit chi-squared test | (custom) |
Effect size estimation
| Type | Effect size | CI available? | Function used |
| Parametric/Non-parametric | Pearson's C | Yes | effectsize::pearsons_c() |
| Bayesian | No | No | No |
grouped_ggpiestats, ggbarstats,
grouped_ggbarstats
# for reproducibility set.seed(123) # one sample goodness of fit proportion test p <- ggpiestats(mtcars, vs) # looking at the plot p # extracting details from statistical tests extract_stats(p) # association test (or contingency table analysis) ggpiestats(mtcars, vs, cyl) # Bayesian test ggpiestats(mtcars, vs, cyl, type = "bayes") # with repelled labels to avoid overlapping ggpiestats(mtcars, vs, label.repel = TRUE) # show counts instead of percentages ggpiestats(mtcars, vs, label = "counts") # show both counts and percentages ggpiestats(mtcars, vs, label = "both") # using pre-aggregated data with counts ggpiestats(as.data.frame(Titanic), Survived, counts = Freq)# for reproducibility set.seed(123) # one sample goodness of fit proportion test p <- ggpiestats(mtcars, vs) # looking at the plot p # extracting details from statistical tests extract_stats(p) # association test (or contingency table analysis) ggpiestats(mtcars, vs, cyl) # Bayesian test ggpiestats(mtcars, vs, cyl, type = "bayes") # with repelled labels to avoid overlapping ggpiestats(mtcars, vs, label.repel = TRUE) # show counts instead of percentages ggpiestats(mtcars, vs, label = "counts") # show both counts and percentages ggpiestats(mtcars, vs, label = "both") # using pre-aggregated data with counts ggpiestats(as.data.frame(Titanic), Survived, counts = Freq)
Scatterplots from {ggplot2} combined with marginal distributions plots
with statistical details.
ggscatterstats( data, x, y, type = "parametric", conf.level = 0.95, bf.prior = 0.707, bf.message = TRUE, tr = 0.2, digits = 2L, results.subtitle = TRUE, label.var = NULL, label.expression = NULL, marginal = TRUE, point.args = list(size = 3, alpha = 0.4, stroke = 0), point.width.jitter = 0, point.height.jitter = 0, point.label.args = list(size = 3, max.overlaps = 1e+06), smooth.line.args = list(linewidth = 1.5, color = "blue", method = "lm", formula = y ~ x), xsidehistogram.args = list(fill = "#4285F4", color = "black", na.rm = TRUE), ysidehistogram.args = list(fill = "#EA4335", color = "black", na.rm = TRUE), xsidehistogram.scale = list(), ysidehistogram.scale = list(), xlab = NULL, ylab = NULL, title = NULL, subtitle = NULL, caption = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), ggplot.component = NULL, ... )ggscatterstats( data, x, y, type = "parametric", conf.level = 0.95, bf.prior = 0.707, bf.message = TRUE, tr = 0.2, digits = 2L, results.subtitle = TRUE, label.var = NULL, label.expression = NULL, marginal = TRUE, point.args = list(size = 3, alpha = 0.4, stroke = 0), point.width.jitter = 0, point.height.jitter = 0, point.label.args = list(size = 3, max.overlaps = 1e+06), smooth.line.args = list(linewidth = 1.5, color = "blue", method = "lm", formula = y ~ x), xsidehistogram.args = list(fill = "#4285F4", color = "black", na.rm = TRUE), ysidehistogram.args = list(fill = "#EA4335", color = "black", na.rm = TRUE), xsidehistogram.scale = list(), ysidehistogram.scale = list(), xlab = NULL, ylab = NULL, title = NULL, subtitle = NULL, caption = NULL, ggtheme = ggstatsplot::theme_ggstatsplot(), ggplot.component = NULL, ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The column in |
y |
The column in |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
conf.level |
Scalar between |
bf.prior |
A number between |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
tr |
Trim level for the mean when carrying out |
digits |
Number of digits for rounding or significant figures. May also
be |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
label.var |
Variable to use for points labels entered as a symbol (e.g.
|
label.expression |
An expression evaluating to a logical vector that
determines the subset of data points to label (e.g. |
marginal |
Decides whether marginal distributions will be plotted on
axes using |
point.args |
A list of additional aesthetic arguments to be passed to
the |
point.width.jitter, point.height.jitter
|
Degree of jitter in |
point.label.args |
A list of additional aesthetic arguments to be passed
to |
smooth.line.args |
A list of additional aesthetic arguments to be passed
to |
xsidehistogram.args, ysidehistogram.args
|
A list of arguments passed to
respective |
xsidehistogram.scale, ysidehistogram.scale
|
A list of arguments passed
to |
xlab |
Label for |
ylab |
Labels for |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
caption |
The text for the plot caption. This argument is relevant only
if |
ggtheme |
A |
ggplot.component |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggscatterstats.html
| graphical element | geom used |
argument for further modification |
| raw data | ggplot2::geom_point() |
point.args |
| labels for raw data | ggrepel::geom_label_repel() |
point.label.args |
| smooth line | ggplot2::geom_smooth() |
smooth.line.args |
| marginal histograms | ggside::geom_xsidehistogram(), ggside::geom_ysidehistogram() |
xsidehistogram.args, ysidehistogram.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing and Effect size estimation
| Type | Test | CI available? | Function used |
| Parametric | Pearson's correlation coefficient | Yes | correlation::correlation() |
| Non-parametric | Spearman's rank correlation coefficient | Yes | correlation::correlation() |
| Robust | Winsorized Pearson's correlation coefficient | Yes | correlation::correlation() |
| Bayesian | Bayesian Pearson's correlation coefficient | Yes | correlation::correlation()
|
The plot uses ggrepel::geom_label_repel() to attempt to keep labels
from over-lapping to the largest degree possible. As a consequence plot
times will slow down massively (and the plot file will grow in size) if you
have a lot of labels that overlap.
grouped_ggscatterstats, ggcorrmat,
grouped_ggcorrmat
set.seed(123) # creating a plot p <- ggscatterstats( iris, x = Sepal.Width, y = Petal.Length, label.var = Species, label.expression = Sepal.Length > 7.6 ) + ggplot2::geom_rug(sides = "b") # looking at the plot p # extracting details from statistical tests extract_stats(p) # customize marginal histogram bins and scales ggscatterstats( mtcars, x = wt, y = mpg, results.subtitle = FALSE, xsidehistogram.args = list(fill = "#4285F4", color = "black", na.rm = TRUE, binwidth = 0.5), ysidehistogram.args = list(fill = "#EA4335", color = "black", na.rm = TRUE, bins = 15), xsidehistogram.scale = list(breaks = seq(0, 15, 5)), ysidehistogram.scale = list(breaks = seq(0, 15, 5)) )set.seed(123) # creating a plot p <- ggscatterstats( iris, x = Sepal.Width, y = Petal.Length, label.var = Species, label.expression = Sepal.Length > 7.6 ) + ggplot2::geom_rug(sides = "b") # looking at the plot p # extracting details from statistical tests extract_stats(p) # customize marginal histogram bins and scales ggscatterstats( mtcars, x = wt, y = mpg, results.subtitle = FALSE, xsidehistogram.args = list(fill = "#4285F4", color = "black", na.rm = TRUE, binwidth = 0.5), ysidehistogram.args = list(fill = "#EA4335", color = "black", na.rm = TRUE, bins = 15), xsidehistogram.scale = list(breaks = seq(0, 15, 5)), ysidehistogram.scale = list(breaks = seq(0, 15, 5)) )
A combination of box and violin plots along with raw (unjittered) data points for within-subjects designs with statistical details included in the plot as a subtitle.
ggwithinstats( data, x, y, type = "parametric", subject.id = NULL, pairwise.display = "significant", pairwise.alpha = 0.05, p.adjust.method = "holm", bf.prior = 0.707, bf.message = TRUE, results.subtitle = TRUE, xlab = NULL, ylab = NULL, caption = NULL, title = NULL, subtitle = NULL, digits = 2L, conf.level = 0.95, tr = 0.2, alternative = "two.sided", centrality.plotting = TRUE, centrality.type = type, centrality.point.args = list(size = 5, color = "darkred"), centrality.label.args = list(size = 3, nudge_x = 0.4, segment.linetype = 4), centrality.path = TRUE, centrality.path.args = list(linewidth = 1, color = "red", alpha = 0.5), point.args = list(size = 3, alpha = 0.5, na.rm = TRUE), point.path = TRUE, point.path.args = list(alpha = 0.5, linetype = "dashed"), boxplot.args = list(width = 0.2, alpha = 0.5, na.rm = TRUE), violin.args = list(width = 0.5, alpha = 0.2, na.rm = TRUE), ggsignif.args = list(textsize = 3, tip_length = 0.01, na.rm = TRUE), ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )ggwithinstats( data, x, y, type = "parametric", subject.id = NULL, pairwise.display = "significant", pairwise.alpha = 0.05, p.adjust.method = "holm", bf.prior = 0.707, bf.message = TRUE, results.subtitle = TRUE, xlab = NULL, ylab = NULL, caption = NULL, title = NULL, subtitle = NULL, digits = 2L, conf.level = 0.95, tr = 0.2, alternative = "two.sided", centrality.plotting = TRUE, centrality.type = type, centrality.point.args = list(size = 5, color = "darkred"), centrality.label.args = list(size = 3, nudge_x = 0.4, segment.linetype = 4), centrality.path = TRUE, centrality.path.args = list(linewidth = 1, color = "red", alpha = 0.5), point.args = list(size = 3, alpha = 0.5, na.rm = TRUE), point.path = TRUE, point.path.args = list(alpha = 0.5, linetype = "dashed"), boxplot.args = list(width = 0.2, alpha = 0.5, na.rm = TRUE), violin.args = list(width = 0.5, alpha = 0.2, na.rm = TRUE), ggsignif.args = list(textsize = 3, tip_length = 0.01, na.rm = TRUE), ggtheme = ggstatsplot::theme_ggstatsplot(), palette = "ggthemes::gdoc", ggplot.component = NULL, ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable from |
y |
The response (or outcome or dependent) variable from |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
subject.id |
Across repeated measures conditions, each row in the
dataset must correspond to a unique unit (e.g., subject or participant).
If your data frame is already in such a format, you can ignore the
|
pairwise.display |
Decides which pairwise comparisons to display. Available options are:
You can use this argument to make sure that your plot is not uber-cluttered
when you have multiple groups being compared and scores of pairwise
comparisons being displayed. If set to |
pairwise.alpha |
Numeric alpha threshold used to decide which pairwise
comparisons are displayed when |
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
bf.prior |
A number between |
bf.message |
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: |
results.subtitle |
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: |
xlab |
Label for |
ylab |
Labels for |
caption |
The text for the plot caption. This argument is relevant only
if |
title |
The text for the plot title. |
subtitle |
The text for the plot subtitle. Will work only if
|
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
centrality.plotting |
Logical that decides whether centrality tendency
measure is to be displayed as a point with a label (Default:
If you want default centrality parameter, you can specify this using
|
centrality.type |
Decides which centrality parameter is to be displayed.
The default is to choose the same as
Just as |
centrality.point.args, centrality.label.args
|
A list of additional aesthetic
arguments to be passed to |
centrality.path.args, point.path.args
|
A list of additional aesthetic
arguments passed on to |
point.args |
A list of additional aesthetic arguments to be passed to
the |
point.path, centrality.path
|
Logical that decides whether individual
data points and means, respectively, should be connected using
|
boxplot.args |
A list of additional aesthetic arguments passed on to
|
violin.args |
A list of additional aesthetic arguments to be passed to
the |
ggsignif.args |
A list of additional aesthetic
arguments to be passed to |
ggtheme |
A |
palette |
Name of the palette in |
ggplot.component |
A |
... |
Currently ignored. |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggwithinstats.html
| graphical element | geom used |
argument for further modification |
| raw data | ggplot2::geom_point() |
point.args |
| point path | ggplot2::geom_path() |
point.path.args |
| box plot | ggplot2::geom_boxplot() |
boxplot.args |
| density plot | ggplot2::geom_violin() |
violin.args |
| centrality measure point | ggplot2::geom_point() |
centrality.point.args |
| centrality measure point path | ggplot2::geom_path() |
centrality.path.args |
| centrality measure label | ggrepel::geom_label_repel() |
centrality.label.args |
| pairwise comparisons | ggsignif::geom_signif() |
ggsignif.args
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
| Type | Measure | Function used |
| Parametric | mean | datawizard::describe_distribution() |
| Non-parametric | median | datawizard::describe_distribution() |
| Robust | trimmed mean | datawizard::describe_distribution() |
| Bayesian | MAP | datawizard::describe_distribution()
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | 2 | Student's or Welch's t-test | stats::t.test() |
| Non-parametric | 2 | Mann-Whitney U test | stats::wilcox.test() |
| Robust | 2 | Yuen's test for trimmed means | WRS2::yuen() |
| Bayesian | 2 | Student's t-test | BayesFactor::ttestBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | 2 | Cohen's d, Hedge's g | Yes | effectsize::cohens_d(), effectsize::hedges_g() |
| Non-parametric | 2 | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
| Robust | 2 | Algina-Keselman-Penfield robust standardized difference | Yes | WRS2::akp.effect() |
| Bayesian | 2 | difference | Yes | bayestestR::describe_posterior()
|
Data requirement: Paired tests assume exactly one observation per subject per condition. If your data has multiple trials per cell, aggregate first (e.g., take the mean).
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | 2 | Student's t-test | stats::t.test() |
| Non-parametric | 2 | Wilcoxon signed-rank test | stats::wilcox.test() |
| Robust | 2 | Yuen's test on trimmed means for dependent samples | WRS2::yuend() |
| Bayesian | 2 | Student's t-test | BayesFactor::ttestBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | 2 | Cohen's d, Hedge's g | Yes | effectsize::cohens_d(), effectsize::hedges_g() |
| Non-parametric | 2 | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
| Robust | 2 | Algina-Keselman-Penfield robust standardized difference | Yes | WRS2::wmcpAKP() |
| Bayesian | 2 | difference | Yes | bayestestR::describe_posterior()
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | > 2 | Fisher's or Welch's one-way ANOVA | stats::oneway.test() |
| Non-parametric | > 2 | Kruskal-Wallis one-way ANOVA | stats::kruskal.test() |
| Robust | > 2 | Heteroscedastic one-way ANOVA for trimmed means | WRS2::t1way() |
| Bayesian | > 2 | Fisher's ANOVA | BayesFactor::anovaBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | > 2 | partial eta-squared, partial omega-squared | Yes | effectsize::omega_squared(), effectsize::eta_squared() |
| Non-parametric | > 2 | rank epsilon squared | Yes | effectsize::rank_epsilon_squared() |
| Robust | > 2 | Explanatory measure of effect size | Yes | WRS2::t1way() |
| Bayesian | > 2 | Bayesian R-squared | Yes | performance::r2_bayes()
|
Data requirement: Repeated measures tests assume a complete design with
exactly one observation per subject per condition. If your data has multiple
trials per cell, aggregate first (e.g., take the mean). Verify with
table(data$subject, data$condition) — every cell should equal 1.
Hypothesis testing
| Type | No. of groups | Test | Function used |
| Parametric | > 2 | One-way repeated measures ANOVA | afex::aov_ez() |
| Non-parametric | > 2 | Friedman rank sum test | stats::friedman.test() |
| Robust | > 2 | Heteroscedastic one-way repeated measures ANOVA for trimmed means | WRS2::rmanova() |
| Bayesian | > 2 | One-way repeated measures ANOVA | BayesFactor::anovaBF()
|
Effect size estimation
| Type | No. of groups | Effect size | CI available? | Function used |
| Parametric | > 2 | partial eta-squared, partial omega-squared | Yes | effectsize::omega_squared(), effectsize::eta_squared() |
| Non-parametric | > 2 | Kendall's coefficient of concordance | Yes | effectsize::kendalls_w() |
| Robust | > 2 | Algina-Keselman-Penfield robust standardized difference average | Yes | WRS2::wmcpAKP() |
| Bayesian | > 2 | Bayesian R-squared | Yes | performance::r2_bayes()
|
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
| Type | Equal variance? | Test | p-value adjustment? | Function used |
| Parametric | No | Games-Howell test | Yes | PMCMRplus::gamesHowellTest() |
| Parametric | Yes | Student's t-test | Yes | stats::pairwise.t.test() |
| Non-parametric | No | Dunn test | Yes | PMCMRplus::kwAllPairsDunnTest() |
| Robust | No | Yuen's trimmed means test | Yes | WRS2::lincon() |
| Bayesian | NA |
Student's t-test | NA |
BayesFactor::ttestBF()
|
Effect size estimation
Not supported.
Data requirement: Paired pairwise tests assume exactly one observation per subject per condition. If your data has multiple trials per cell, aggregate first (e.g., take the mean).
Hypothesis testing
| Type | Test | p-value adjustment? | Function used |
| Parametric | Student's t-test | Yes | stats::pairwise.t.test() |
| Non-parametric | Durbin-Conover test | Yes | PMCMRplus::durbinAllPairsTest() |
| Robust | Yuen's trimmed means test | Yes | WRS2::rmmcp() |
| Bayesian | Student's t-test | NA |
BayesFactor::ttestBF()
|
Effect size estimation
Not supported.
grouped_ggbetweenstats, ggbetweenstats,
grouped_ggwithinstats
# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) # create a plot p <- ggwithinstats( data = filter(bugs_long, condition %in% c("HDHF", "HDLF")), x = condition, y = desire, type = "np", subject.id = subject ) # looking at the plot p # if the data are already arranged in repeated-measures order, `subject.id` # can be omitted ggwithinstats( data = filter(bugs_long, condition %in% c("HDHF", "HDLF")), x = condition, y = desire, pairwise.display = "none", results.subtitle = FALSE ) # extracting details from statistical tests extract_stats(p) # use a stricter alpha threshold for significant pairwise comparisons ggwithinstats( data = bugs_long, x = condition, y = desire, subject.id = subject, pairwise.alpha = 0.001 ) # modifying defaults ggwithinstats( data = bugs_long, x = condition, y = desire, type = "robust", subject.id = subject ) # you can remove a specific geom to reduce complexity of the plot ggwithinstats( data = bugs_long, x = condition, y = desire, subject.id = subject, # to remove violin plot violin.args = list(width = 0, linewidth = 0, colour = NA), # to remove boxplot boxplot.args = list(width = 0), # to remove points point.args = list(alpha = 0) )# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) # create a plot p <- ggwithinstats( data = filter(bugs_long, condition %in% c("HDHF", "HDLF")), x = condition, y = desire, type = "np", subject.id = subject ) # looking at the plot p # if the data are already arranged in repeated-measures order, `subject.id` # can be omitted ggwithinstats( data = filter(bugs_long, condition %in% c("HDHF", "HDLF")), x = condition, y = desire, pairwise.display = "none", results.subtitle = FALSE ) # extracting details from statistical tests extract_stats(p) # use a stricter alpha threshold for significant pairwise comparisons ggwithinstats( data = bugs_long, x = condition, y = desire, subject.id = subject, pairwise.alpha = 0.001 ) # modifying defaults ggwithinstats( data = bugs_long, x = condition, y = desire, type = "robust", subject.id = subject ) # you can remove a specific geom to reduce complexity of the plot ggwithinstats( data = bugs_long, x = condition, y = desire, subject.id = subject, # to remove violin plot violin.args = list(width = 0, linewidth = 0, colour = NA), # to remove boxplot boxplot.args = list(width = 0), # to remove points point.args = list(alpha = 0) )
Helper function for ggstatsplot::ggbarstats() to apply this function across
multiple levels of a given factor and combining the resulting plots using
ggstatsplot::combine_plots().
grouped_ggbarstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggbarstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggpiestats.html
ggbarstats, ggpiestats,
grouped_ggpiestats
set.seed(123) # grouped one-sample proportion test grouped_ggbarstats( data = mtcars, x = cyl, grouping.var = am, annotation.args = list(title = "Cylinder distribution by transmission type") )set.seed(123) # grouped one-sample proportion test grouped_ggbarstats( data = mtcars, x = cyl, grouping.var = am, annotation.args = list(title = "Cylinder distribution by transmission type") )
Helper function for ggstatsplot::ggbetweenstats to apply this function
across multiple levels of a given factor and combining the resulting plots
using ggstatsplot::combine_plots.
grouped_ggbetweenstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggbetweenstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
ggbetweenstats, ggwithinstats,
grouped_ggwithinstats
# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) library(ggplot2) grouped_ggbetweenstats( data = filter(ggplot2::mpg, drv != "4"), x = year, y = hwy, grouping.var = drv ) # modifying individual plots using `ggplot.component` argument grouped_ggbetweenstats( data = filter( movies_long, genre %in% c("Action", "Comedy"), mpaa %in% c("R", "PG") ), x = genre, y = rating, grouping.var = mpaa, ggplot.component = scale_y_continuous( breaks = seq(1, 9, 1), limits = (c(1, 9)) ), annotation.args = list(title = "Ratings by genre for different MPAA ratings") )# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) library(ggplot2) grouped_ggbetweenstats( data = filter(ggplot2::mpg, drv != "4"), x = year, y = hwy, grouping.var = drv ) # modifying individual plots using `ggplot.component` argument grouped_ggbetweenstats( data = filter( movies_long, genre %in% c("Action", "Comedy"), mpaa %in% c("R", "PG") ), x = genre, y = rating, grouping.var = mpaa, ggplot.component = scale_y_continuous( breaks = seq(1, 9, 1), limits = (c(1, 9)) ), annotation.args = list(title = "Ratings by genre for different MPAA ratings") )
Helper function for ggstatsplot::ggcorrmat() to apply this function across
multiple levels of a given factor and combining the resulting plots using
ggstatsplot::combine_plots().
grouped_ggcorrmat( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggcorrmat( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame from which variables specified are to be taken. |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggcorrmat.html
ggcorrmat, ggscatterstats,
grouped_ggscatterstats
set.seed(123) grouped_ggcorrmat( data = iris, grouping.var = Species, type = "robust", colors = c("#0072B2", "white", "#D55E00"), p.adjust.method = "holm", plotgrid.args = list(ncol = 1L), annotation.args = list(tag_levels = "i") )set.seed(123) grouped_ggcorrmat( data = iris, grouping.var = Species, type = "robust", colors = c("#0072B2", "white", "#D55E00"), p.adjust.method = "holm", plotgrid.args = list(ncol = 1L), annotation.args = list(tag_levels = "i") )
Helper function for ggstatsplot::ggdotplotstats() to apply this function
across multiple levels of a given factor and combining the resulting plots
using ggstatsplot::combine_plots().
grouped_ggdotplotstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggdotplotstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggdotplotstats.html
grouped_gghistostats, ggdotplotstats,
gghistostats
# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) # removing factor level with very few no. of observations df <- filter(ggplot2::mpg, cyl %in% c("4", "6", "8")) # plot grouped_ggdotplotstats( data = df, x = cty, y = manufacturer, grouping.var = cyl, test.value = 15.5, annotation.args = list(title = "City mileage by manufacturer for different cylinders") )# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) # removing factor level with very few no. of observations df <- filter(ggplot2::mpg, cyl %in% c("4", "6", "8")) # plot grouped_ggdotplotstats( data = df, x = cty, y = manufacturer, grouping.var = cyl, test.value = 15.5, annotation.args = list(title = "City mileage by manufacturer for different cylinders") )
Helper function for ggstatsplot::gghistostats to apply this function
across multiple levels of a given factor and combining the resulting plots
using ggstatsplot::combine_plots.
grouped_gghistostats( data, x, grouping.var, binwidth = NULL, plotgrid.args = list(), annotation.args = list(), ... )grouped_gghistostats( data, x, grouping.var, binwidth = NULL, plotgrid.args = list(), annotation.args = list(), ... )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
A numeric variable from the data frame |
grouping.var |
A single grouping variable. |
binwidth |
The width of the histogram bins. Can be specified as a
numeric value, or a function that calculates width from |
plotgrid.args |
A |
annotation.args |
A |
... |
Arguments passed on to
|
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/gghistostats.html
gghistostats, ggdotplotstats,
grouped_ggdotplotstats
# for reproducibility set.seed(123) # plot grouped_gghistostats( data = iris, x = Sepal.Length, test.value = 5, grouping.var = Species, plotgrid.args = list(nrow = 1), annotation.args = list(tag_levels = "i") )# for reproducibility set.seed(123) # plot grouped_gghistostats( data = iris, x = Sepal.Length, test.value = 5, grouping.var = Species, plotgrid.args = list(nrow = 1), annotation.args = list(tag_levels = "i") )
Helper function for ggstatsplot::ggpiestats to apply this
function across multiple levels of a given factor and combining the
resulting plots using ggstatsplot::combine_plots.
grouped_ggpiestats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggpiestats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggpiestats.html
ggbarstats, ggpiestats,
grouped_ggbarstats
set.seed(123) # grouped one-sample proportion test grouped_ggpiestats( data = mtcars, x = cyl, grouping.var = am, annotation.args = list(title = "Cylinder distribution by transmission type") )set.seed(123) # grouped one-sample proportion test grouped_ggpiestats( data = mtcars, x = cyl, grouping.var = am, annotation.args = list(title = "Cylinder distribution by transmission type") )
Grouped scatterplots from {ggplot2} combined with marginal distribution
plots with statistical details added as a subtitle.
grouped_ggscatterstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggscatterstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
For details, see: https://www.indrapatil.com/ggstatsplot/articles/web_only/ggscatterstats.html
ggscatterstats, ggcorrmat,
grouped_ggcorrmat
# to ensure reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) library(ggplot2) grouped_ggscatterstats( data = filter(movies_long, genre == "Comedy" | genre == "Drama"), x = length, y = rating, type = "robust", grouping.var = genre, ggplot.component = list(geom_rug(sides = "b")) ) # using labeling # (also show how to modify basic plot from within function call) grouped_ggscatterstats( data = filter(ggplot2::mpg, cyl != 5), x = displ, y = hwy, grouping.var = cyl, type = "robust", label.var = manufacturer, label.expression = hwy > 25 & displ > 2.5, ggplot.component = scale_y_continuous(sec.axis = dup_axis()) ) # labeling without expression grouped_ggscatterstats( data = filter(movies_long, rating == 7, genre %in% c("Drama", "Comedy")), x = budget, y = length, grouping.var = genre, bf.message = FALSE, label.var = "title", annotation.args = list(tag_levels = "a") ) # customize marginal histogram bins and scales grouped_ggscatterstats( data = filter(movies_long, genre %in% c("Drama", "Comedy")), x = rating, y = length, grouping.var = genre, results.subtitle = FALSE, xsidehistogram.args = list(fill = "#4285F4", color = "black", na.rm = TRUE, bins = 20), ysidehistogram.args = list(fill = "#EA4335", color = "black", na.rm = TRUE, binwidth = 10), xsidehistogram.scale = list(breaks = seq(0, 200, 50)), ysidehistogram.scale = list(breaks = seq(0, 200, 50)) )# to ensure reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) library(ggplot2) grouped_ggscatterstats( data = filter(movies_long, genre == "Comedy" | genre == "Drama"), x = length, y = rating, type = "robust", grouping.var = genre, ggplot.component = list(geom_rug(sides = "b")) ) # using labeling # (also show how to modify basic plot from within function call) grouped_ggscatterstats( data = filter(ggplot2::mpg, cyl != 5), x = displ, y = hwy, grouping.var = cyl, type = "robust", label.var = manufacturer, label.expression = hwy > 25 & displ > 2.5, ggplot.component = scale_y_continuous(sec.axis = dup_axis()) ) # labeling without expression grouped_ggscatterstats( data = filter(movies_long, rating == 7, genre %in% c("Drama", "Comedy")), x = budget, y = length, grouping.var = genre, bf.message = FALSE, label.var = "title", annotation.args = list(tag_levels = "a") ) # customize marginal histogram bins and scales grouped_ggscatterstats( data = filter(movies_long, genre %in% c("Drama", "Comedy")), x = rating, y = length, grouping.var = genre, results.subtitle = FALSE, xsidehistogram.args = list(fill = "#4285F4", color = "black", na.rm = TRUE, bins = 20), ysidehistogram.args = list(fill = "#EA4335", color = "black", na.rm = TRUE, binwidth = 10), xsidehistogram.scale = list(breaks = seq(0, 200, 50)), ysidehistogram.scale = list(breaks = seq(0, 200, 50)) )
A combined plot of comparison plot created for levels of a grouping variable.
grouped_ggwithinstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )grouped_ggwithinstats( data, ..., grouping.var, plotgrid.args = list(), annotation.args = list() )
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
... |
Arguments passed on to
|
grouping.var |
A single grouping variable. |
plotgrid.args |
A |
annotation.args |
A |
ggwithinstats, ggbetweenstats,
grouped_ggbetweenstats
# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) library(ggplot2) # the most basic function call grouped_ggwithinstats( data = filter(bugs_long, condition %in% c("HDHF", "HDLF")), x = condition, y = desire, subject.id = subject, grouping.var = gender, type = "np", # additional modifications for **each** plot using `{ggplot2}` functions ggplot.component = scale_y_continuous(breaks = seq(0, 10, 1), limits = c(0, 10)), annotation.args = list(title = "Desire ratings by condition for each gender") )# for reproducibility set.seed(123) library(dplyr, warn.conflicts = FALSE) library(ggplot2) # the most basic function call grouped_ggwithinstats( data = filter(bugs_long, condition %in% c("HDHF", "HDLF")), x = condition, y = desire, subject.id = subject, grouping.var = gender, type = "np", # additional modifications for **each** plot using `{ggplot2}` functions ggplot.component = scale_y_continuous(breaks = seq(0, 10, 1), limits = c(0, 10)), annotation.args = list(title = "Desire ratings by condition for each gender") )
Edgar Anderson's Iris Data in long format.
iris_longiris_long
A data frame with 600 rows and 5 variables
id. Dummy identity number for each flower (150 flowers in total).
Species. The species are Iris setosa, versicolor, and virginica.
condition. Factor giving a detailed description of the attribute
(Four levels: "Petal.Length", "Petal.Width", "Sepal.Length",
"Sepal.Width").
attribute. What attribute is being measured ("Sepal" or "Pepal").
measure. What aspect of the attribute is being measured ("Length" or "Width").
value. Value of the measurement.
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
This is a modified dataset from {datasets} package.
dim(iris_long) head(iris_long) dplyr::glimpse(iris_long)dim(iris_long) head(iris_long) dplyr::glimpse(iris_long)
Movie information and user ratings from IMDB.com (long format).
movies_longmovies_long
A data frame with 1,579 rows and 8 variables
title. Title of the movie.
year. Year of release.
budget. Total budget (if known) in US dollars
length. Length in minutes.
rating. Average IMDB user rating.
votes. Number of IMDB users who rated this movie.
mpaa. MPAA rating.
genre. Different genres of movies (action, animation, comedy, drama, documentary, romance, short).
Modified dataset from {ggplot2movies} package.
The internet movie database (IMDB) is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by amazon.
https://CRAN.R-project.org/package=ggplot2movies
dim(movies_long) head(movies_long) dplyr::glimpse(movies_long)dim(movies_long) head(movies_long) dplyr::glimpse(movies_long)
{ggstatsplot}
Common theme used across all plots generated in {ggstatsplot} and assumed
by the author to be aesthetically pleasing to the user. The theme is a
wrapper around ggplot2::theme_bw().
All {ggstatsplot} functions have a ggtheme parameter that let you choose
a different theme.
theme_ggstatsplot()theme_ggstatsplot()
A ggplot object.
library(ggplot2) ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_ggstatsplot()library(ggplot2) ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_ggstatsplot()
Titanic dataset.
Titanic_fullTitanic_full
A data frame with 2201 rows and 5 variables
id. Dummy identity number for each person.
Class. 1st, 2nd, 3rd, Crew.
Sex. Male, Female.
Age. Child, Adult.
Survived. No, Yes.
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner 'Titanic', summarized according to economic status (class), sex, age and survival.
This is a modified dataset from {datasets} package.
dim(Titanic_full) head(Titanic_full) dplyr::glimpse(Titanic_full)dim(Titanic_full) head(Titanic_full) dplyr::glimpse(Titanic_full)