The Double Blind Pig: A Placebo Controlled Speakeasy

Introduction

The Double Blind Pig: A Placebo Controlled Speakeasy is an event hosted by Campus Callosum at Burning Man. It’s now in its third year having been established in 2022.

The Pig is not just a bar, it’s also an experiment. When you order a drink, you flip a coin and get assigned to either the red or the blue condition. Your bartender serves you a drink of the corresponding color, and your job is to figure out whether or not it contains alcohol.

The experiment is double blind in that neither you nor your bartender knows if the drink contains alcohol. The drinks are mixed behind the scenes by a barback and placed in red and blue containers, so even the bartender doesn’t know what they’re serving!

We collect data throughout the event and this document summarizes our results from 2024!

The drinks

Our booze supply this year

This year we had 4 types of drinks on offer (2 beers and 2 cocktails), each with an alcoholic and non-alcoholic version:

IPA: Lagunitas Daytime IPA and Athletic NA IPA
Guinness: Classic Guinness and the NA Guinness
Moscow Mule: With and without vodka
Bloody Mary: With and without vodka

Hypotheses

Our hypothesis is basically that people on the playa are pretty bad at telling what they’re consuming. The placebo effect can be strong and so we want to see if people can really tell when they’re consuming alcohol. The idea is to get the taste of the drinks as close as possible so that people have to go on how they feel but it’s hard to do!

# Constants

color.day = "#16324f"
color.grey = "#444444"
color.darkgrey = "#222222"
pal.drinks = c("#B22222", "#020202", "#FFC42E", "#32CD32")
pal.drinks.accent = c("#710000", "#e7d19f", "#7d5a00", "#116511")
pal.accuracy = c("#228B22", "#FF4500")
pal.colors = c("#2A628F", "#B22222")
pal.condition = c("#5B17AF", "#999999")

Subjects and drinks

Justin, Nani and Sally, welcoming you to the Double Blind Pig!

Drinks by day

We served a total of 201 drinks across 4 events. The most popular day was Wednesday with 60 drinks served!

df %>%
  ggplot(aes(x = day_long, label = day_long)) + 
  geom_bar(fill = color.day) + 
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none"
  ) + 
  labs(
    y = "Drinks served",
    x = "Day"
  )

Drinks by drink

The most popular drink was the Moscow Mule, of which we served 70 across Thursday and Saturday. Only this can explain why so few people (37) wanted Guinness (which we served on the same days).

df %>%
  ggplot(aes(x = drink, label = drink, fill = drink)) + 
  geom_bar() + 
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
  ) + 
  scale_fill_manual(
    values = pal.drinks
  ) +
  labs(
    y = "Drinks served",
    x = "Drink"
  )

Randomization

Our randomization worked pretty much as intended! The coin apparently had a slight preference for red, landing that way 54% of the time.

df %>%
  ggplot(aes(x = color, label = color, fill = color)) + 
  geom_bar() + 
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
  ) + 
  scale_fill_manual(
    values = pal.colors
  ) +
  labs(
    y = "Drinks served",
    x = "Treatment color"
  )

More importantly, almost exactly half (48%) of drinks were alcoholic, ensuring a nice fair sample across the experiment.

N.B: These numbers differ from the red/blue split because the randomization scheme changed each day.

df %>%
  ggplot(aes(x = condition, label = condition, fill = condition)) + 
  geom_bar() + 
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
  ) + 
  scale_fill_manual(
    values = pal.condition
  ) +
  labs(
    y = "Drinks served",
    x = "Treatment"
  )

Overall responses

Subjects didn’t think much of our generosity, however, with a miserly 33% believing that they had been given an alcoholic drink.

df %>%
  mutate(
    response = ifelse(response == "A", "Alcoholic", "Non-alcoholic")
  ) %>%
  ggplot(aes(x = response, fill = response)) + 
  geom_bar() + 
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
  ) + 
  scale_fill_manual(
    values = pal.condition
  ) +
  labs(
    y = "No. Responses",
    x = "Response"
  )

Accuracy

Scientists Dan and Helene thinking harder than most of our subjects

Overall Accuracy

Of 201 judgements, 137 were accurate, meaning that 68% of judgements were accurate overall!

df %>%
  mutate(
    accuracy = ifelse(accuracy == 1, "Accurate", "Inaccurate")
  ) %>%
  ggplot(aes(x = accuracy, fill = accuracy)) + 
  geom_bar() + 
  geom_text(
    stat = "count", 
    aes(label = accuracy), 
    vjust = -0.8, 
    color = pal.accuracy,  # Coloring the labels
    size = 7
  ) + 
  scale_fill_manual(values=pal.accuracy) + 
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),  # Remove x-axis labels
  ) + 
  coord_cartesian(ylim = c(0, 150)) +
  labs(
    y = "No. Judgements"
  )

Accuracy by treatment

Accuracy was much higher for the non-alcoholic treatment than the alcoholic one, with only 51% of subjects who had alcohol getting it right, while fully 84% who had the control figured it out.

df %>%
  ggplot(aes(x = condition, fill = condition, y = accuracy, label = condition, color = condition)) + 
  stat_summary(fun="mean", geom="bar") +
  stat_summary(geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1, color="#666666") +
  scale_fill_manual(values=pal.condition) + 
  scale_color_manual(values=pal.condition) + 
  stat_summary(fun = "mean", geom = "text", vjust = -3, size=7) +
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
    # axis.title.x = element_blank(),
    axis.text.x = element_blank(),
  ) + 
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  coord_cartesian(ylim = c(0, 1)) +
  labs(
    x = "Treatment",
    y = "Accuracy"
  )

Accuracy by day

Unsurprisingly, there wasn’t much variation in accuracy by day.

df %>%
  ggplot(aes(x = day_long, y = accuracy, label = day_long, )) + 
  stat_summary(fun="mean", geom="bar", fill = color.day) +
  stat_summary(geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1, color="#666666") +
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
  ) + 
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  coord_cartesian(ylim = c(0, 1)) +
  labs(
    x = "Day",
    y = "Accuracy"
  )

Accuracy by drink

The IPA was the easiest to detect, with 75% of subjects getting it right. Bloody Mary was the hardest, with only 63% detecting correctly.

df %>%
  ggplot(aes(x = drink, y = accuracy, label = drink,
             fill = drink)) + 
  stat_summary(fun="mean", geom="bar", ) +
  stat_summary(aes(color=drink), geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1) +
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
  ) + 
  scale_fill_manual(values = pal.drinks) +
  scale_color_manual(values = pal.drinks.accent) +
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  coord_cartesian(ylim = c(0, 1)) +
  labs(
    x = "Day",
    y = "Accuracy"
  )

Accuracy by drink and treatment

The drinks also patterned differently in terms of how much harder the control versus alcoholic treatments were to get right. There was a big difference for the Moscow Mule, with 39% accuracy for alcohol vs 86% for control. In contrast, there was barely any difference for Guinness (67% for alcohol vs 74% for the NA Guinness).

df %>%
  mutate(
    condition = ifelse(condition == "Alcoholic", "A", "NA")
  ) %>%
  ggplot(aes(x = condition, y = accuracy, label = drink,
             fill = drink)) + 
  stat_summary(fun="mean", geom="bar", position = position_dodge(width=0.9)) +
  stat_summary(aes(color=drink), geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1, position = position_dodge(width=0.9)) +
  theme_minimal(base_size=18) + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
  ) + 
  scale_fill_manual(values = pal.drinks) +
  scale_color_manual(values = pal.drinks.accent) +
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  coord_cartesian(ylim = c(0, 1)) +
  facet_grid(. ~ drink) +
  labs(
    x = "Day",
    y = "Accuracy"
  )

Error types

Another way of thinking about this is in terms of the types of errors that participants made. A false positive occurs when a subject has an NA drink and thinks it contains alcohol, while a false negative is the opposite (drinking alcohol and thinking it’s NA).

The majority of errors for most drink types were false negatives, with only a small number of false positives. The exception was Guinness, where 45% of errors were false positives. A good review for the NA Guinness!

pal.drinks = c("#B22222", "#020202", "#FFC42E", "#32CD32")
pal.drinks.accent = c("#710000", "#e7d19f", "#7d5a00", "#116511")

df %>%
  filter(accuracy == 0) %>%
  mutate(
    error_type = case_when(
      active == "A" & accuracy == 0 ~ "False Negative",
      active == "C" & accuracy == 0 ~ "False Positive",
      TRUE ~ "Correct"
    )
  ) %>%
  ggplot(aes(x = drink, fill = error_type)) + 
  geom_bar(position = "fill") +  # Scales bars to 100%
  theme_minimal(base_size = 18) + 
  theme(
    axis.title.x = element_blank(),
    legend.title = element_blank(),
    legend.position = "bottom"
  ) + 
  scale_fill_manual(values = rev(pal.condition)) +
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +  # Format y-axis as percentages
  labs(
    y = "Proportion of Responses",
    x = "Drink"
  )

Confidence

Our lovely bartenders Joerg and Aurelia: confidence embodied

Distribution

Our subjects were pretty confident overall, with the majority giving scores above 50%, and 19% saying they were between 90 and 100% confident.

df %>%
  ggplot(aes(x = confidence,)) + 
  geom_histogram(bins=10, fill = color.day) +
  theme_minimal(base_size = 18) + 
  theme(
    legend.position = "bottom"
  ) + 
  scale_x_continuous(labels = scales::percent_format(scale = 1)) +
  labs(
    y = "Responses",
    x = "Confidence"
  )

Calibration

Confidence was poorly calibrated with judgement accuracy. As confidence scores increased, actual accuracy decreased.

df %>%
  ggplot(aes(x = confidence, y = accuracy)) + 
  stat_summary_bin(fun = "mean", geom = "point", bins = 10, color=color.day) +
  geom_smooth(method="lm", formula="y~x", color=color.day) +
  theme_minimal(base_size = 18) + 
  theme(
    legend.position = "bottom"
  ) + 
  scale_x_continuous(labels = scales::percent_format(scale = 1)) +
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  labs(
    y = "Accuracy",
    x = "Confidence"
  )

Here we can see the confidence for each judgement, transformed so that confidence in NA responses are inverted (negative). The larger points and errorbars represent the average of the confidence-weighted judgements. Of the alcoholic drinks, people were most confident that Guinness was alcoholic, and least confident about the Moscow Mule. Of the non-alcoholic drinks the highest alcohol confidence was again for Guinness, but the lowest was for the IPA.

df %>%
  mutate(
    alc_conf = ifelse(response == "A", confidence, -1 * confidence)
  ) %>%
  ggplot(aes(x = alc_conf, y = drink, color=condition)) + 
  theme_minimal(base_size=18) + 
  theme(legend.position = "bottom") +
  geom_vline(xintercept = 0, linetype="dashed", linewidth=1, color="#888888") +
  geom_jitter(alpha=0.4, size=3, stroke=0, height=0.2, width=0) + 
  stat_summary(fun.data = mean_cl_boot, geom="errorbarh", linewidth=1, height=0.2, alpha=0.8, position = position_dodge(width=0.9)) +
  stat_summary(fun = mean, geom="point", size=4, alpha=0.8, position = position_dodge(width=0.9)) +  
  scale_y_discrete(limits=rev) + 
  scale_color_brewer(palette="Set1") + 
  labs(
    x = "Confidence in Alcohol judgement",
    color = "Ground Truth",
    y = "Drink"
  )

Statistical Analysis

We can use regression models to test whether people were statistically more likely to judge a drink to be alcoholic if it truly was, while accounting for variation by drink.

We see a significant effect of treatment overall t(199) = -5.58, p = 7.79e-08. This suggests that participants were significantly more likely to judge drinks to be alcoholic when they really were.

m <- lmer(response ~ active + (1 | drink), data=df %>% mutate(response = ifelse(response == "A", 1, 0)))
summary(m)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: response ~ active + (1 | drink)
##    Data: df %>% mutate(response = ifelse(response == "A", 1, 0))
## 
## REML criterion at convergence: 245.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2592 -0.4686 -0.3206  1.1060  1.9700 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  drink    (Intercept) 0.002169 0.04658 
##  Residual             0.190591 0.43657 
## Number of obs: 201, groups:  drink, 4
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   0.51309    0.05044   7.28085   10.17 1.46e-05 ***
## activeC      -0.34512    0.06185 198.71200   -5.58 7.79e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr)
## activeC -0.636

Comments

We (foolishly) asked participants for comments on the experiment and recieved a wide range of thoughtful responses.

Some were insightful tasting notes:

tasty

has clean, crisp finish

muy rico

possible dark malty quality in masking

ginger, lemon, lime, lsd

ginger, mandarin, … mushrooms

tomato juice is confusing

Others were confessions of having consumed other substances that might interfere with the integrity of our results:

was already drunk

had a shot 5 min before

alcoholic & drink a lot

absinthe before

Ate a little bit of an edible

Others seemed to be feeling the effects:

a bit more talkative

feel the buzz

But quite a few seemed to be unhappy with our setup:

no buzz

couldn’t finish drink

burn stomach

now feeling dizzy

We hope they’re doing okay!

Answers

In case you came along to one of the events and want to know what the answers were, here they are:

answers %>%
  mutate(
    color = ifelse(color == "B", "Blue", "Red"),
    treatment = ifelse(active == "A", "Alcoholic", "Non-Alcoholic")
  ) %>%
  select(
    day, drink, color, treatment
  )

day	drink	color	treatment
Mon	IPA	Blue	Non-Alcoholic
Mon	IPA	Red	Alcoholic
Mon	Bloody Mary	Blue	Non-Alcoholic
Mon	Bloody Mary	Red	Alcoholic
Wed	IPA	Blue	Non-Alcoholic
Wed	IPA	Red	Alcoholic
Wed	Bloody Mary	Blue	Alcoholic
Wed	Bloody Mary	Red	Non-Alcoholic
Thu	Guinness	Blue	Alcoholic
Thu	Guinness	Red	Non-Alcoholic
Thu	Moscow Mule	Blue	Alcoholic
Thu	Moscow Mule	Red	Non-Alcoholic
Sat	Guinness	Blue	Alcoholic
Sat	Guinness	Red	Non-Alcoholic
Sat	Moscow Mule	Blue	Non-Alcoholic
Sat	Moscow Mule	Red	Alcoholic

Thanks

Thanks to everyone who came along to the Double Blind Pig this year! Especially the people who were lured in off the busy playa by the idea of participating in an experiment <3. Thanks to everyone in Campus Callosum for volunteering their time to help run the bar! And to the camp organizers for all of the work they put in to make it happen each year. And to YOU for reading this! Why did you read this? Don’t you have better things to do?!

Double Blind Pig 2024 Analysis

Campus Callosum

2024-09-28