Computes bivariate statistics for a set of variables according to the subgroups of observations defined by a categorical variable.

cattab(x, y, weights = NULL, percent = "column",
       robust = TRUE, show.n = TRUE, show.asso = TRUE,
       digits = c(1,1), na.rm = TRUE, na.value = "NAs")

Arguments

x

data frame. The variables which are described in rows. They can be numerical or factors.

y

factor. The categorical variable which defines subgroups of observations described in columns.

weights

numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.

percent

character. Whether to compute row percentages ("row") or column percentages ("column", default).

robust

logical. Whether to use medians instead of means. Default is TRUE.

show.n

logical. Whether to display frequencies (between brackets) in addition to the percentages. Default is TRUE.

show.asso

logical. Whether to add a column with measures of global association (Cramer's V and eta-squared). Default is TRUE.

digits

vector of 2 integers. The first value sets the number of digits for percentages, the second one sets the number of digits for medians and means. Default is c(1,1). If NULL, the results are not rounded.

na.rm

logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).

na.value

character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.

Details

The function uses gtsummary package to build the table of statistics, and then gt package to finalize the layout. Weights are handled silently with survey package.

Besides, the function is compatible with the attribute labels assigned with labelled package : these labels are displayed automatically.

Note

This function is quite similar to profiles, but displays the results in a fancier way.

Value

An object of class gt_tbl.

Author

Nicolas Robette

Examples

# \dontrun{
data(Movies)
cattab(x = Movies[, c("Genre", "ArtHouse", "Critics", "BoxOffice")],
       y = Movies$Country)
Movies_Country
Total
(n=1000)
Association1
Europe
(n=72)
France
(n=605)
Other
(n=26)
USA
(n=297)
Genre




0.275
    Action 19.4% (14) 9.8% (59) 23.1% (6) 29.0% (86) 16.5% (165)
    Animation 2.8% (2) 3.0% (18) 0.0% (0) 8.8% (26) 4.6% (46)
    Other 5.6% (4) 2.1% (13) 0.0% (0) 3.0% (9) 2.6% (26)
    ComDram 13.9% (10) 18.2% (110) 11.5% (3) 8.8% (26) 14.9% (149)
    Comedy 22.2% (16) 23.8% (144) 15.4% (4) 19.5% (58) 22.2% (222)
    Documentary 1.4% (1) 12.1% (73) 3.8% (1) 0.7% (2) 7.7% (77)
    Drama 25.0% (18) 29.6% (179) 34.6% (9) 11.8% (35) 24.1% (241)
    Horror 4.2% (3) 0.2% (1) 3.8% (1) 6.7% (20) 2.5% (25)
    SciFi 5.6% (4) 1.3% (8) 7.7% (2) 11.8% (35) 4.9% (49)
ArtHouse 45.8% (33) 65.0% (393) 76.9% (20) 13.5% (40) 48.6% (486) 0.469
Critics




0.017
    median 3.0 3.0 3.2 2.7 3.0
    (Q1 - Q3) (2.3 - 3.4) (2.3 - 3.5) (2.4 - 3.7) (2.0 - 3.2) (2.3 - 3.5)
BoxOffice




0.048
    median 106,747.0 57,140.0 49,098.0 328,559.0 106,747.0
    (Q1 - Q3) (10,291.0 - 493,595.0) (9,988.0 - 310,777.0) (21,341.0 - 164,270.0) (94,925.0 - 926,106.0) (20,252.0 - 468,809.0)
1 Cramer’s V (categorical var.) or eta-squared (continuous var.)
# }