Maraca Plots - Validation

This vignette is largely based on the PharmaSUG 2023 conference paper.

When using a maraca plot in a regulatory setting, for example to visualize clinical study results for regulatory submissions, there will be strict validation demands. For example, the results might need to be double programmed for validation purposes.

In order to facilitate the validation of the graphic output, the maraca package includes the function validate_maraca_plot() that allows the user to extract important metrics from the plot itself. This allows to programmatically compare the results of a plot produced using the maraca package with other programmatic approaches. The paper linked above walks through a validation example where the double programming was done in SAS.

To use the validation functionality, we need to first create a maraca object.

library(maraca)

data(hce_scenario_a)

maraca_dat <- maraca(
  data = hce_scenario_a,
  step_outcomes = c("Outcome I", "Outcome II", "Outcome III", "Outcome IV"),
  last_outcome = "Continuous outcome",
  fixed_followup_days = 3 * 365,
  column_names = c(outcome = "GROUP", arm = "TRTP", value = "AVAL0"),
  arm_levels = c(active = "Active", control = "Control"),
  compute_win_odds = TRUE
)

We then create a maraca plot and save the actual plot as an object.

# Save plot as its own object
maraca_plot <- plot(maraca_dat)
# The plot has its own class called "maracaPlot"
class(maraca_plot)
## [1] "maracaPlot" "gg"         "ggplot"
# Display plot
maraca_plot

Now we can validate the plot using the validate_maraca_plot() function.

validation_list <- validate_maraca_plot(maraca_plot) 
# Display which metrics are included
str(validation_list)
## List of 9
##  $ plot_type       : chr "GeomViolin+GeomBoxplot"
##  $ proportions     : Named num [1:5] 13.8 18.1 12.7 9.8 45.6
##   ..- attr(*, "names")= chr [1:5] "Outcome I" "Outcome II" "Outcome III" "Outcome IV" ...
##  $ tte_data        :'data.frame':    544 obs. of  3 variables:
##   ..$ x    : num [1:544] 0.125 0.159 0.463 0.591 0.621 ...
##   ..$ y    : num [1:544] 0.2 0.4 0.6 0.2 0.4 0.6 0.8 0.8 1 1.2 ...
##   ..$ group: Factor w/ 2 levels "Active","Control": 2 2 2 1 1 1 2 1 2 2 ...
##  $ binary_step_data: NULL
##  $ binary_last_data: NULL
##  $ scatter_data    : NULL
##  $ boxstat_data    :'data.frame':    2 obs. of  9 variables:
##   ..$ group        : Factor w/ 2 levels "Active","Control": 1 2
##   ..$ x_lowest     : num [1:2] 60.8 54.4
##   ..$ whisker_lower: num [1:2] 63.8 57.8
##   ..$ hinge_lower  : num [1:2] 74.9 71
##   ..$ median       : num [1:2] 79.7 75.9
##   ..$ hinge_upper  : num [1:2] 83.9 80.7
##   ..$ whisker_upper: num [1:2] 95.5 94.5
##   ..$ x_highest    : num [1:2] 95.5 100
##   ..$ outliers     :List of 2
##   .. ..$ : num 60.8
##   .. ..$ : num [1:5] 54.4 96.4 97.2 99.7 100
##  $ violin_data     :'data.frame':    1024 obs. of  5 variables:
##   ..$ group  : Factor w/ 2 levels "Active","Control": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ x      : num [1:1024] 60.8 60.8 60.9 61 61 ...
##   ..$ y      : num [1:1024] 44 44 44 44 44 44 44 44 44 44 ...
##   ..$ density: num [1:1024] 0.00117 0.00121 0.00124 0.00127 0.0013 ...
##   ..$ width  : num [1:1024] 18.7 18.7 18.7 18.7 18.7 ...
##  $ wo_stats        : Named num [1:4] 1.64 1.42 1.91 9.35e-12
##   ..- attr(*, "names")= chr [1:4] "winodds" "lowerCI" "upperCI" "p_value"

Running the validate_maraca_plot() function on a maraca plot object returns a list with the following items:

  1. plot_type: depending on which density_plot_type was selected for the plot either GeomPoint, GeomViolin and/or GeomBoxplot
  2. proportions: the proportions of the HCE components
  3. tte_data: time-to-event data if part of the step outcomes has type tte, otherwise NULL
  4. binary_step_data: binary data if part of the step outcomes has type binary, otherwise NULL
  5. binary_step_data: if last endpoint was binary then contains the data for the minimum, maximum and middle point x values displayed in the ellipsis, otherwise NULL
  6. scatter_data: if last endpoint was continuous and plot was created with density_plot_type = "scatter" then contains dataset that was plotted in scatter plot, otherwise NULL
  7. boxstat_data: if last endpoint was continuous and if plot was created with density_plot_type = "box" or density_plot_type = "default" then contains the boxplot statistics, otherwise NULL
  8. violin_data: if last endpoint was continuous and if plot was created with density_plot_type = "violin" or density_plot_type = "default" then contains the violin distribution data, otherwise NULL
  9. wo_stats: if maraca object was created with compute_win_odds = TRUE then contains the win odds statistics, otherwise NULL

These can then be converted to a convenient format for validation, such as as individual data.frames.

library(dplyr)
library(tidyr)

validation_list$proportions %>% 
  as.data.frame() %>%
  rename("proportion" = ".")
##                    proportion
## Outcome I                13.8
## Outcome II               18.1
## Outcome III              12.7
## Outcome IV                9.8
## Continuous outcome       45.6

head(validation_list$tte_data)
##           x   y   group
## 3 0.1251495 0.2 Control
## 4 0.1593700 0.4 Control
## 5 0.4625571 0.6 Control
## 6 0.5905571 0.2  Active
## 7 0.6209704 0.4  Active
## 8 0.6478802 0.6  Active

validation_list$boxstat_data %>%
 unnest_wider(outliers, names_sep = "") %>%
 pivot_longer(., cols = -group, names_to = "stat_name", values_to = "values") %>%
 filter(!is.na(values)) %>% 
  as.data.frame()
##      group     stat_name    values
## 1   Active      x_lowest  60.77490
## 2   Active whisker_lower  63.82323
## 3   Active   hinge_lower  74.87904
## 4   Active        median  79.68861
## 5   Active   hinge_upper  83.87907
## 6   Active whisker_upper  95.53322
## 7   Active     x_highest  95.53322
## 8   Active     outliers1  60.77490
## 9  Control      x_lowest  54.40000
## 10 Control whisker_lower  57.76833
## 11 Control   hinge_lower  70.99354
## 12 Control        median  75.90131
## 13 Control   hinge_upper  80.73315
## 14 Control whisker_upper  94.45711
## 15 Control     x_highest 100.00000
## 16 Control     outliers1  54.40000
## 17 Control     outliers2  96.39603
## 18 Control     outliers3  97.24586
## 19 Control     outliers4  99.68534
## 20 Control     outliers5 100.00000

head(validation_list$violin_data)
##    group        x  y     density width
## 1 Active 60.77490 44 0.001174508 18.72
## 2 Active 60.84292 44 0.001205014 18.72
## 3 Active 60.91094 44 0.001236189 18.72
## 4 Active 60.97896 44 0.001268532 18.72
## 5 Active 61.04698 44 0.001301859 18.72
## 6 Active 61.11500 44 0.001336067 18.72

validation_list$wo_stats
##      winodds      lowerCI      upperCI      p_value 
## 1.643265e+00 1.416117e+00 1.906848e+00 9.354073e-12