R dplyr summarize percent

12/3/2023

Text(x, y + y/2, labels = paste0(as.character(y*100), '%')) # placing male valuesĪnother plot that is related to the stacked bar plot is the mosaic plot: mosaicplot(prop.table(table(dat$smoking, dat$gender)), Text(x, y/2, labels = paste0(as.character(y*100), '%')) # placing female values Y = prop.table(table(dat$gender, dat$smoking)) X = barplot(prop.table(table(dat$gender, dat$smoking)),Īxis(2, at = yticks, lab = percent(yticks)) We can use a stacked bar plot to visualize a contingency table: library(scales) # to show percentages on the y-axis More generally we can say that, in our sample, most current and past smokers are males (53.85% and 54.16%, respectively) and most non-smokers are females (56.67%). Interpretation: For instance, 0.4615… is the proportion of current smokers who are females ( ⚠ this is different from the proportion females who are current smokers).

Here, each cell represents the count of individuals in this category divided by the column total: table3 = prop.table(table(dat$gender, dat$smoking), 2) More generally we can say that, in our sample, most females are non-smokers (42.5%), but most males are current smokers (35%). Interpretation: For instance, 0.3 is the proportion of females who currently smoke ( ⚠ this is different from the proportion of current smokers who are females). Here, each cell represents the count of individuals in this category divided by the row total: table2 = prop.table(table(dat$gender, dat$smoking), 1) And females who are past smokers is the smallest category in our data (only 13.75%). More generally, we can say that our sample contains mostly male current smokers (17.5%). Interpretation: The first number, 0.15 is the proportion of individuals in our sample who are both females and current smokers. (Notice that all the cells in the table sum up to 1) # Current smoker Non-smoker Past smoker Sum # we can add margins to the table to make it clearer Here, each cell represents the count of individuals in this category divided by the total sample size: table1 = prop.table(table(dat$gender, dat$smoking)) We can also show the proportion of individuals in each cell. Interpretation: For instance, in our sample, 12 participants are females and current smokers. Each cell in this table corresponds to the number of occurrences of a particular combination of values of the 2 variables.

a 2-way frequency table or a frequency table with 2 variables) describes the relationship between 2 categorical variables. Describing the relationship between gender and smokingĪ contingency table (a.k.a. And 32.5% of the total participants are current smokers, 30% are past smokers and 37.5% are non-smokers.Īgain, we can create bar plots to visualize these tables: par(mfrow=c(1,2)) # show the following plots side by sideīarplot(prop.table(table(dat$gender)), ylab = 'Proportion of participants')īarplot(prop.table(table(dat$smoking)), ylab = 'Proportion of participants') Interpretation: 50% of the participants are females and 50% are males. Returning to tables, instead of showing the number of occurrences of each category, we can show the proportion of each category: prop.table(table(dat$gender)) We can use bar plots to visualize these 2 frequency tables: par(mfrow=c(1,2)) # show the following plots side by sideīarplot(table(dat$gender), ylab = 'Number of participants')īarplot(table(dat$smoking), ylab = 'Number of participants') And 26 participants are current smokers, 24 are past smokers, and 30 are non-smokers. Interpretation: Our sample consists of 40 females and 40 males. Summarizing gender and smoking, one variable at a timeĪ frequency table shows the number of occurrences of each category of a variable: table(dat$gender) Next, we will create a frequency table and a bar plot to summarize these data one variable at a time, then we will create a contingency table and a stacked bar plot to describe the relationship between the 2 variables. # $ smoking: Factor w/ 3 levels "Current smoker".: 2 1 3 2 1 1 1 2 2 2. Smoking = sample(c('Past smoker', 'Current smoker', 'Non-smoker'), 80, replace = TRUE)ĭat = ame(gender = as.factor(gender), Gender = sample(c('Female', 'Male'), 80, replace = TRUE) # create 2 categorical variables with 80 observations each Let’s start by creating our own data, consisting of 2 categorical variables: gender and smoking: set.seed(10)

0 Comments

R dplyr summarize percent

Leave a Reply.

Author

Archives

Categories