The Double Blind Pig: A Placebo Controlled Speakeasy is an event hosted by Campus Callosum at Burning Man. It’s now in its third year having been established in 2022.
The Pig is not just a bar, it’s also an experiment. When you order a drink, you flip a coin and get assigned to either the red or the blue condition. Your bartender serves you a drink of the corresponding color, and your job is to figure out whether or not it contains alcohol.
The experiment is double blind in that neither you nor your bartender knows if the drink contains alcohol. The drinks are mixed behind the scenes by a barback and placed in red and blue containers, so even the bartender doesn’t know what they’re serving!
We collect data throughout the event and this document summarizes our results from 2024!
This year we had 4 types of drinks on offer (2 beers and 2 cocktails), each with an alcoholic and non-alcoholic version:
Our hypothesis is basically that people on the playa are pretty bad at telling what they’re consuming. The placebo effect can be strong and so we want to see if people can really tell when they’re consuming alcohol. The idea is to get the taste of the drinks as close as possible so that people have to go on how they feel but it’s hard to do!
# Constants
color.day = "#16324f"
color.grey = "#444444"
color.darkgrey = "#222222"
pal.drinks = c("#B22222", "#020202", "#FFC42E", "#32CD32")
pal.drinks.accent = c("#710000", "#e7d19f", "#7d5a00", "#116511")
pal.accuracy = c("#228B22", "#FF4500")
pal.colors = c("#2A628F", "#B22222")
pal.condition = c("#5B17AF", "#999999")
We served a total of 201 drinks across 4 events. The most popular day was Wednesday with 60 drinks served!
The most popular drink was the Moscow Mule, of which we served 70 across Thursday and Saturday. Only this can explain why so few people (37) wanted Guinness (which we served on the same days).
Our randomization worked pretty much as intended! The coin apparently had a slight preference for red, landing that way 54% of the time.
df %>%
ggplot(aes(x = color, label = color, fill = color)) +
geom_bar() +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
) +
scale_fill_manual(
values = pal.colors
) +
labs(
y = "Drinks served",
x = "Treatment color"
)
More importantly, almost exactly half (48%) of drinks were alcoholic, ensuring a nice fair sample across the experiment.
N.B: These numbers differ from the red/blue split because the randomization scheme changed each day.
Subjects didn’t think much of our generosity, however, with a miserly 33% believing that they had been given an alcoholic drink.
df %>%
mutate(
response = ifelse(response == "A", "Alcoholic", "Non-alcoholic")
) %>%
ggplot(aes(x = response, fill = response)) +
geom_bar() +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
) +
scale_fill_manual(
values = pal.condition
) +
labs(
y = "No. Responses",
x = "Response"
)
Of 201 judgements, 137 were accurate, meaning that 68% of judgements were accurate overall!
df %>%
mutate(
accuracy = ifelse(accuracy == 1, "Accurate", "Inaccurate")
) %>%
ggplot(aes(x = accuracy, fill = accuracy)) +
geom_bar() +
geom_text(
stat = "count",
aes(label = accuracy),
vjust = -0.8,
color = pal.accuracy, # Coloring the labels
size = 7
) +
scale_fill_manual(values=pal.accuracy) +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
axis.title.x = element_blank(),
axis.text.x = element_blank(), # Remove x-axis labels
) +
coord_cartesian(ylim = c(0, 150)) +
labs(
y = "No. Judgements"
)
Accuracy was much higher for the non-alcoholic treatment than the alcoholic one, with only 51% of subjects who had alcohol getting it right, while fully 84% who had the control figured it out.
df %>%
ggplot(aes(x = condition, fill = condition, y = accuracy, label = condition, color = condition)) +
stat_summary(fun="mean", geom="bar") +
stat_summary(geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1, color="#666666") +
scale_fill_manual(values=pal.condition) +
scale_color_manual(values=pal.condition) +
stat_summary(fun = "mean", geom = "text", vjust = -3, size=7) +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
# axis.title.x = element_blank(),
axis.text.x = element_blank(),
) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
coord_cartesian(ylim = c(0, 1)) +
labs(
x = "Treatment",
y = "Accuracy"
)
Unsurprisingly, there wasn’t much variation in accuracy by day.
df %>%
ggplot(aes(x = day_long, y = accuracy, label = day_long, )) +
stat_summary(fun="mean", geom="bar", fill = color.day) +
stat_summary(geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1, color="#666666") +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
axis.title.x = element_blank(),
) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
coord_cartesian(ylim = c(0, 1)) +
labs(
x = "Day",
y = "Accuracy"
)
The IPA was the easiest to detect, with 75% of subjects getting it right. Bloody Mary was the hardest, with only 63% detecting correctly.
df %>%
ggplot(aes(x = drink, y = accuracy, label = drink,
fill = drink)) +
stat_summary(fun="mean", geom="bar", ) +
stat_summary(aes(color=drink), geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1) +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
axis.title.x = element_blank(),
) +
scale_fill_manual(values = pal.drinks) +
scale_color_manual(values = pal.drinks.accent) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
coord_cartesian(ylim = c(0, 1)) +
labs(
x = "Day",
y = "Accuracy"
)
The drinks also patterned differently in terms of how much harder the control versus alcoholic treatments were to get right. There was a big difference for the Moscow Mule, with 39% accuracy for alcohol vs 86% for control. In contrast, there was barely any difference for Guinness (67% for alcohol vs 74% for the NA Guinness).
df %>%
mutate(
condition = ifelse(condition == "Alcoholic", "A", "NA")
) %>%
ggplot(aes(x = condition, y = accuracy, label = drink,
fill = drink)) +
stat_summary(fun="mean", geom="bar", position = position_dodge(width=0.9)) +
stat_summary(aes(color=drink), geom="errorbar", fun.data = mean_cl_boot, width=0.1, linewidth=1, position = position_dodge(width=0.9)) +
theme_minimal(base_size=18) +
theme(
legend.position = "none",
axis.title.x = element_blank(),
) +
scale_fill_manual(values = pal.drinks) +
scale_color_manual(values = pal.drinks.accent) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
coord_cartesian(ylim = c(0, 1)) +
facet_grid(. ~ drink) +
labs(
x = "Day",
y = "Accuracy"
)
Another way of thinking about this is in terms of the types of errors that participants made. A false positive occurs when a subject has an NA drink and thinks it contains alcohol, while a false negative is the opposite (drinking alcohol and thinking it’s NA).
The majority of errors for most drink types were false negatives, with only a small number of false positives. The exception was Guinness, where 45% of errors were false positives. A good review for the NA Guinness!
pal.drinks = c("#B22222", "#020202", "#FFC42E", "#32CD32")
pal.drinks.accent = c("#710000", "#e7d19f", "#7d5a00", "#116511")
df %>%
filter(accuracy == 0) %>%
mutate(
error_type = case_when(
active == "A" & accuracy == 0 ~ "False Negative",
active == "C" & accuracy == 0 ~ "False Positive",
TRUE ~ "Correct"
)
) %>%
ggplot(aes(x = drink, fill = error_type)) +
geom_bar(position = "fill") + # Scales bars to 100%
theme_minimal(base_size = 18) +
theme(
axis.title.x = element_blank(),
legend.title = element_blank(),
legend.position = "bottom"
) +
scale_fill_manual(values = rev(pal.condition)) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) + # Format y-axis as percentages
labs(
y = "Proportion of Responses",
x = "Drink"
)
Our subjects were pretty confident overall, with the majority giving scores above 50%, and 19% saying they were between 90 and 100% confident.
Confidence was poorly calibrated with judgement accuracy. As confidence scores increased, actual accuracy decreased.
df %>%
ggplot(aes(x = confidence, y = accuracy)) +
stat_summary_bin(fun = "mean", geom = "point", bins = 10, color=color.day) +
geom_smooth(method="lm", formula="y~x", color=color.day) +
theme_minimal(base_size = 18) +
theme(
legend.position = "bottom"
) +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(
y = "Accuracy",
x = "Confidence"
)
Here we can see the confidence for each judgement, transformed so that confidence in NA responses are inverted (negative). The larger points and errorbars represent the average of the confidence-weighted judgements. Of the alcoholic drinks, people were most confident that Guinness was alcoholic, and least confident about the Moscow Mule. Of the non-alcoholic drinks the highest alcohol confidence was again for Guinness, but the lowest was for the IPA.
df %>%
mutate(
alc_conf = ifelse(response == "A", confidence, -1 * confidence)
) %>%
ggplot(aes(x = alc_conf, y = drink, color=condition)) +
theme_minimal(base_size=18) +
theme(legend.position = "bottom") +
geom_vline(xintercept = 0, linetype="dashed", linewidth=1, color="#888888") +
geom_jitter(alpha=0.4, size=3, stroke=0, height=0.2, width=0) +
stat_summary(fun.data = mean_cl_boot, geom="errorbarh", linewidth=1, height=0.2, alpha=0.8, position = position_dodge(width=0.9)) +
stat_summary(fun = mean, geom="point", size=4, alpha=0.8, position = position_dodge(width=0.9)) +
scale_y_discrete(limits=rev) +
scale_color_brewer(palette="Set1") +
labs(
x = "Confidence in Alcohol judgement",
color = "Ground Truth",
y = "Drink"
)
We can use regression models to test whether people were statistically more likely to judge a drink to be alcoholic if it truly was, while accounting for variation by drink.
We see a significant effect of treatment overall
t(199) = -5.58, p = 7.79e-08
. This suggests that
participants were significantly more likely to judge drinks to be
alcoholic when they really were.
m <- lmer(response ~ active + (1 | drink), data=df %>% mutate(response = ifelse(response == "A", 1, 0)))
summary(m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: response ~ active + (1 | drink)
## Data: df %>% mutate(response = ifelse(response == "A", 1, 0))
##
## REML criterion at convergence: 245.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2592 -0.4686 -0.3206 1.1060 1.9700
##
## Random effects:
## Groups Name Variance Std.Dev.
## drink (Intercept) 0.002169 0.04658
## Residual 0.190591 0.43657
## Number of obs: 201, groups: drink, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.51309 0.05044 7.28085 10.17 1.46e-05 ***
## activeC -0.34512 0.06185 198.71200 -5.58 7.79e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## activeC -0.636
In case you came along to one of the events and want to know what the answers were, here they are:
answers %>%
mutate(
color = ifelse(color == "B", "Blue", "Red"),
treatment = ifelse(active == "A", "Alcoholic", "Non-Alcoholic")
) %>%
select(
day, drink, color, treatment
)
day | drink | color | treatment |
---|---|---|---|
Mon | IPA | Blue | Non-Alcoholic |
Mon | IPA | Red | Alcoholic |
Mon | Bloody Mary | Blue | Non-Alcoholic |
Mon | Bloody Mary | Red | Alcoholic |
Wed | IPA | Blue | Non-Alcoholic |
Wed | IPA | Red | Alcoholic |
Wed | Bloody Mary | Blue | Alcoholic |
Wed | Bloody Mary | Red | Non-Alcoholic |
Thu | Guinness | Blue | Alcoholic |
Thu | Guinness | Red | Non-Alcoholic |
Thu | Moscow Mule | Blue | Alcoholic |
Thu | Moscow Mule | Red | Non-Alcoholic |
Sat | Guinness | Blue | Alcoholic |
Sat | Guinness | Red | Non-Alcoholic |
Sat | Moscow Mule | Blue | Non-Alcoholic |
Sat | Moscow Mule | Red | Alcoholic |
Thanks to everyone who came along to the Double Blind Pig this year! Especially the people who were lured in off the busy playa by the idea of participating in an experiment <3. Thanks to everyone in Campus Callosum for volunteering their time to help run the bar! And to the camp organizers for all of the work they put in to make it happen each year. And to YOU for reading this! Why did you read this? Don’t you have better things to do?!
Comments
We (foolishly) asked participants for comments on the experiment and recieved a wide range of thoughtful responses.
Some were insightful tasting notes:
Others were confessions of having consumed other substances that might interfere with the integrity of our results:
Others seemed to be feeling the effects:
But quite a few seemed to be unhappy with our setup:
We hope they’re doing okay!