library(tidyverse)
library(rairtable)
library(keras)
library(tfhub)
The notebook below demonstrates the ability to use NLP and Deep Learning to predict the probability of declination. The algorithm uses only the information submitted through the website and processes these texts to produce a score based on reviewer declination labels from previous rounds of proposals. The algorithm produces a reasonably performant 0.84 ROC AUC that suggests solid predictive performance. In the Analysis section below we demonstrate the empirical results of two use cases around (1) ordering proposals due to limited reviewer resources or (2) highlighting declined proposals that had low predicted declination scores to ensure we did not mistakenly decline a worthy proposal.
Funders seek to fund promising ideas and it is their core competency to be able to do this well. The process of selecting which proposals get funded is a challenging one as indicators of a successful program more often rely on retrospective storytelling vs measured objective impact.
When proposals are submitted to a funder, they go through a very human process of review and selection. In our case, a number of reviewers go through a series of review rounds with increasing scrutiny and attention and a decreasing number of proposals that make to it successive rounds.
The first round has the greatest number of proposals and with an Open Call the variation in proposals can vary quite a lot. Based on what is written in the proposal, some of these decisions can be made fairly quickly without too much scrutiny. It is also hard to predict how many proposals are submitted and there are times where the number of proposals can overwhelm the resources available to review them.
The notebook below will attempt to use a history of labeled proposals and their textual content to produce a predictive algorithm to rate proposal quality. This quality score can be leveraged to prioritize proposals when there is a reviewer resource constraint or be used as a secondary robustness check to ensure high scoring proposals are not missed due to reviewer fatigue.
We’re using the data from an Airtable for all the proposals that are submitted through the website.
Use the Airtable API to read from our submissions table:
rairtable::set_airtable_api_key(read_lines("at_key.txt"))
✔ AIRTABLE_API_KEY set for current session. To install your API key for use in future sessions, run this function with `install = TRUE`.
submissions_at <- airtable("tblheGxoCuoMOFC6K", "appSz3aSZX0uVXKbV")
submissions <- submissions_at %>% read_airtable()
dim(submissions)
[1] 933 83
A total of 933 submissions and 83 columns.
tibble(
name = submissions %>% names(),
type = sapply(submissions, class)) %>%
DT::datatable()
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Only some of these columns are derived from the original proposal submission while others are added by or labeled by the reviewers. It’s important that we only consider the original content of the proposal along with the prediction target which is the key reviewer outcome we wish to estimate. Other fields are likely to be related to the key outcome and can make the algorithm appear more predictive than it is.
The key outcomes we’re focused on predicting are related to decisions
made about the proposal which are ultimately captured in
Round 1 Outcome
and Round 2 Outcome
fields.
There are a greater number of proposals in Round 1 and there is likely
less nuance that goes into decisions in this round compared to future
rounds. There’s a trade-off going into this between having a more
balanced labels (Round 1) while potentially having a clearer and
ultimately more important signal of success (Round 2). The nuance in
decision making in Round 2 also may be more hard to capture with our
features since there may be a good amount of extra research outside of
the proposal content going into it.
Let’s first have a look at the values in these two columns:
submissions %>%
unnest(Call) %>%
select(Call, ends_with("Outcome")) %>%
pivot_longer(-Call, names_to = "round", values_to = "status") %>%
group_by(Call, round, status) %>%
count(name = "count") %>%
ungroup() %>%
# filter(round == "Round 1 Outcome") %>%
# select(-round) %>%
arrange(Call, desc(count)) %>%
group_by(Call) %>%
mutate(pct = count / sum(count),
total = sum(count),
Call = str_remove(Call, c(" and Mentoring")),
Call = str_glue("{Call} ({total})")) %>%
ggplot(aes(Call, pct, fill = status, label = count)) +
geom_col() +
coord_flip() +
facet_wrap(~ round, ncol = 1)
NA values are always tricky to deal with, what do they mean in each round?
submissions %>%
group_by(across(ends_with("Outcome"))) %>%
count() %>% ungroup() %>%
filter(is.na(`Round 1 Outcome`) | is.na(`Round 2 Outcome`))
Having an NA value in Round 1 directly translates to an NA value in Round 2 but an NA value in Round 2 infers a variety of things for Round 1. We’re unsure if the NA values in Round 2 map to a successful proposal or if they mean that they didn’t succeed Round 1 or Round 2.
submissions_outcomes <- submissions %>%
mutate(
r1_declined = `Round 1 Outcome`== "Decline",
r2_declined = `Round 2 Outcome` == "Declined")
library(ggalluvial)
submissions_outcomes %>%
unnest(Call) %>%
mutate(Call = Call,
across(ends_with(" Outcome"), replace_na, "NA"), .keep = "used") %>%
filter(!`Round 1 Outcome` %in% c("Decline", "NA")) %>%
group_by(across(everything())) %>%
count() %>% ungroup() %>%
mutate(Accepted = `Round 2 Outcome` != "Declined") %>%
ggplot(aes(y = n, axis1 = `Round 1 Outcome`, axis2 = `Round 2 Outcome`)) +
geom_alluvium(aes(fill = Accepted), width = 1/12) +
geom_stratum(width = 1/12, fill = "black", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum)), size = 2) +
scale_x_discrete(limits = c("Round 1 Outcome", "Round 2 Outcome"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set1") +
facet_wrap(~ Call, scales = "free")
We’re treating this as a text classification problem and most columns are text based with the exception of a few columns which we will try to convert to text.
sub_text_all <- submissions_outcomes %>%
# turn lists with texts into a text column
# for anything else we just get the count
mutate(across(where(is.list), function(any_list) {
if (class(any_list[[1]]) == "character") {
map_chr(any_list, ~ str_c(.x, collapse = " "))
} else {
map_chr(any_list, ~ str_c(cur_column(), length(.x)))
}
})) %>%
select(airtable_record_id,
`Your Name`,
`Online profile`,
Attachments,
`Email of primary contact`,
`Applicant organization`,
Call,
Progress,
Activities,
`Outputs & Outcomes`,
Superpowers,
`IP Waiver`,
`Idea & TOC`,
Evidence,
`Primary Residence Country`,
`Funding requested`,
`Implementation Country`,
`Incorporation Country`,
`Executive Summary`,
ends_with("_declined")) %>%
select(airtable_record_id, ends_with("_declined"), where(is.character)) %>%
unite(text, -c(airtable_record_id, ends_with("_declined")), sep = " ")
A total count of number of declined and not declined submissions at the two rounds:
sub_text_all %>%
group_by(across(ends_with("_declined"))) %>%
count()
What’s with the proposal that was declined in Round 1 but not decined in Round 2?
sub_text_all %>%
filter(r1_declined & !r2_declined) %>%
select(-text) %>%
inner_join(submissions) %>%
select(ends_with("Outcome"))
Joining, by = "airtable_record_id"
sub_text_r1 <- sub_text_all %>%
filter(!is.na(r1_declined))
sub_text_r2 <- sub_text_all %>%
filter(!is.na(r2_declined))
Can a proposal be declined in Round 1 and then referred to fellowship in Round 2?
Break up our proposals into train and test sets:
ind_train <- sample.int(nrow(sub_text_r1), 0.8*nrow(sub_text_r1))
train <- sub_text_r1[ind_train,]
test <- sub_text_r1[-ind_train,]
In a previous version we used a simple text_vectorization supplied by Keras to transform our text into a tensor. This time we try something ore advacned by leveraging a token-based text embedding trained on Google News articles. This leverages transfer learning where models trained on another corpus can be leveraged as components of other models as popularized by Tensorflow Hub.
This becomes the first layer of our network learner, followed by
256/128/32 node ReLU nodes before classifying to a sigmoid function for
declined_r1
classification:
# https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1
embeddings <- layer_hub(
handle = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1",
trainable = FALSE
)
+ . /home/ron/.local/share/r-miniconda/bin/activate
+ conda activate '/home/ron/.local/share/r-miniconda/envs/r-reticulate'
+ '/home/ron/.local/share/r-miniconda/envs/r-reticulate/bin/python' -m pip install --upgrade 'tensorflow_hub'
Requirement already satisfied: tensorflow_hub in /home/ron/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages (0.12.0)
Requirement already satisfied: protobuf>=3.8.0 in /home/ron/.local/lib/python3.6/site-packages (from tensorflow_hub) (3.19.4)
Requirement already satisfied: numpy>=1.12.0 in /home/ron/.local/lib/python3.6/site-packages (from tensorflow_hub) (1.19.5)
Done!
2022-06-13 11:27:39.244867: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-13 11:27:39.606159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11339 MB memory: -> device: 0, name: NVIDIA TITAN X (Pascal), pci bus id: 0000:41:00.0, compute capability: 6.1
2022-06-13 11:27:39.740761: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
input <- layer_input(shape = shape(), dtype = "string")
Loaded Tensorflow version 2.6.2
output <- input %>%
embeddings() %>%
layer_dense(units = 2048, activation = "relu") %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model <- keras_model(input, output)
We pay attention to accuracy and AUC since class labels are not balanced.
model %>%
compile(
loss = "binary_crossentropy",
optimizer = "adam",
metrics = c("accuracy", "AUC"))
history <- model %>%
fit(x = train$text,
y = as.matrix(train[, "r1_declined"]),
epochs = 10,
validation_split = 0.2)
Epoch 1/10
1/18 [>.............................] - ETA: 13s - loss: 0.5797 - accuracy: 0.8438 - auc: 0.8036
5/18 [=======>......................] - ETA: 0s - loss: 1.0540 - accuracy: 0.7250 - auc: 0.5296
8/18 [============>.................] - ETA: 0s - loss: 0.8704 - accuracy: 0.6953 - auc: 0.5193
12/18 [===================>..........] - ETA: 0s - loss: 0.7593 - accuracy: 0.7552 - auc: 0.5130
16/18 [=========================>....] - ETA: 0s - loss: 0.6712 - accuracy: 0.7852 - auc: 0.5101
18/18 [==============================] - 1s 14ms/step - loss: 0.6508 - accuracy: 0.7908 - auc: 0.5171
18/18 [==============================] - 1s 31ms/step - loss: 0.6508 - accuracy: 0.7908 - auc: 0.5171 - val_loss: 0.4947 - val_accuracy: 0.8369 - val_auc: 0.5853
Epoch 2/10
1/18 [>.............................] - ETA: 0s - loss: 0.3479 - accuracy: 0.9375 - auc: 0.8750
5/18 [=======>......................] - ETA: 0s - loss: 0.4259 - accuracy: 0.8438 - auc: 0.6347
9/18 [==============>...............] - ETA: 0s - loss: 0.3889 - accuracy: 0.8611 - auc: 0.6609
13/18 [====================>.........] - ETA: 0s - loss: 0.4125 - accuracy: 0.8438 - auc: 0.7063
17/18 [===========================>..] - ETA: 0s - loss: 0.4052 - accuracy: 0.8438 - auc: 0.7248
18/18 [==============================] - 0s 13ms/step - loss: 0.3999 - accuracy: 0.8440 - auc: 0.7335
18/18 [==============================] - 0s 17ms/step - loss: 0.3999 - accuracy: 0.8440 - auc: 0.7335 - val_loss: 0.4805 - val_accuracy: 0.8440 - val_auc: 0.6412
Epoch 3/10
1/18 [>.............................] - ETA: 0s - loss: 0.3943 - accuracy: 0.8125 - auc: 0.8814
5/18 [=======>......................] - ETA: 0s - loss: 0.3737 - accuracy: 0.8438 - auc: 0.7747
9/18 [==============>...............] - ETA: 0s - loss: 0.3532 - accuracy: 0.8438 - auc: 0.7844
13/18 [====================>.........] - ETA: 0s - loss: 0.3777 - accuracy: 0.8365 - auc: 0.7822
17/18 [===========================>..] - ETA: 0s - loss: 0.3847 - accuracy: 0.8272 - auc: 0.7817
18/18 [==============================] - 0s 13ms/step - loss: 0.3802 - accuracy: 0.8280 - auc: 0.7863
18/18 [==============================] - 0s 17ms/step - loss: 0.3802 - accuracy: 0.8280 - auc: 0.7863 - val_loss: 0.4168 - val_accuracy: 0.8085 - val_auc: 0.7569
Epoch 4/10
1/18 [>.............................] - ETA: 0s - loss: 0.3233 - accuracy: 0.7812 - auc: 0.9114
5/18 [=======>......................] - ETA: 0s - loss: 0.3362 - accuracy: 0.8500 - auc: 0.8801
10/18 [===============>..............] - ETA: 0s - loss: 0.3362 - accuracy: 0.8562 - auc: 0.8329
14/18 [======================>.......] - ETA: 0s - loss: 0.3544 - accuracy: 0.8415 - auc: 0.8322
18/18 [==============================] - 0s 13ms/step - loss: 0.3597 - accuracy: 0.8369 - auc: 0.8219
18/18 [==============================] - 0s 17ms/step - loss: 0.3597 - accuracy: 0.8369 - auc: 0.8219 - val_loss: 0.4609 - val_accuracy: 0.8369 - val_auc: 0.7254
Epoch 5/10
1/18 [>.............................] - ETA: 0s - loss: 0.3847 - accuracy: 0.8438 - auc: 0.8074
5/18 [=======>......................] - ETA: 0s - loss: 0.3129 - accuracy: 0.8438 - auc: 0.8935
9/18 [==============>...............] - ETA: 0s - loss: 0.3089 - accuracy: 0.8542 - auc: 0.8481
14/18 [======================>.......] - ETA: 0s - loss: 0.3395 - accuracy: 0.8504 - auc: 0.8219
18/18 [==============================] - 0s 13ms/step - loss: 0.3412 - accuracy: 0.8564 - auc: 0.8310
18/18 [==============================] - 0s 17ms/step - loss: 0.3412 - accuracy: 0.8564 - auc: 0.8310 - val_loss: 0.4414 - val_accuracy: 0.8298 - val_auc: 0.8100
Epoch 6/10
1/18 [>.............................] - ETA: 0s - loss: 0.4398 - accuracy: 0.7812 - auc: 0.8406
5/18 [=======>......................] - ETA: 0s - loss: 0.3516 - accuracy: 0.8438 - auc: 0.8305
9/18 [==============>...............] - ETA: 0s - loss: 0.3401 - accuracy: 0.8576 - auc: 0.8146
12/18 [===================>..........] - ETA: 0s - loss: 0.3396 - accuracy: 0.8568 - auc: 0.8333
16/18 [=========================>....] - ETA: 0s - loss: 0.3398 - accuracy: 0.8594 - auc: 0.8375
18/18 [==============================] - 0s 14ms/step - loss: 0.3315 - accuracy: 0.8635 - auc: 0.8465
18/18 [==============================] - 0s 18ms/step - loss: 0.3315 - accuracy: 0.8635 - auc: 0.8465 - val_loss: 0.4335 - val_accuracy: 0.8227 - val_auc: 0.7728
Epoch 7/10
1/18 [>.............................] - ETA: 0s - loss: 0.2057 - accuracy: 0.8750 - auc: 1.0000
6/18 [=========>....................] - ETA: 0s - loss: 0.3145 - accuracy: 0.8698 - auc: 0.8788
10/18 [===============>..............] - ETA: 0s - loss: 0.3439 - accuracy: 0.8344 - auc: 0.8613
14/18 [======================>.......] - ETA: 0s - loss: 0.3174 - accuracy: 0.8616 - auc: 0.8691
18/18 [==============================] - 0s 13ms/step - loss: 0.3116 - accuracy: 0.8599 - auc: 0.8660
18/18 [==============================] - 0s 17ms/step - loss: 0.3116 - accuracy: 0.8599 - auc: 0.8660 - val_loss: 0.5518 - val_accuracy: 0.8440 - val_auc: 0.7461
Epoch 8/10
1/18 [>.............................] - ETA: 0s - loss: 0.4827 - accuracy: 0.7500 - auc: 0.9453
5/18 [=======>......................] - ETA: 0s - loss: 0.3545 - accuracy: 0.8313 - auc: 0.8290
9/18 [==============>...............] - ETA: 0s - loss: 0.3467 - accuracy: 0.8507 - auc: 0.8375
13/18 [====================>.........] - ETA: 0s - loss: 0.3274 - accuracy: 0.8582 - auc: 0.8413
17/18 [===========================>..] - ETA: 0s - loss: 0.3263 - accuracy: 0.8603 - auc: 0.8418
18/18 [==============================] - 0s 13ms/step - loss: 0.3340 - accuracy: 0.8528 - auc: 0.8398
18/18 [==============================] - 0s 18ms/step - loss: 0.3340 - accuracy: 0.8528 - auc: 0.8398 - val_loss: 0.4710 - val_accuracy: 0.8298 - val_auc: 0.7961
Epoch 9/10
1/18 [>.............................] - ETA: 0s - loss: 0.3285 - accuracy: 0.8750 - auc: 0.9375
5/18 [=======>......................] - ETA: 0s - loss: 0.2931 - accuracy: 0.8750 - auc: 0.8952
9/18 [==============>...............] - ETA: 0s - loss: 0.3266 - accuracy: 0.8576 - auc: 0.8454
13/18 [====================>.........] - ETA: 0s - loss: 0.2955 - accuracy: 0.8702 - auc: 0.8716
17/18 [===========================>..] - ETA: 0s - loss: 0.3097 - accuracy: 0.8603 - auc: 0.8672
18/18 [==============================] - 0s 13ms/step - loss: 0.3083 - accuracy: 0.8599 - auc: 0.8703
18/18 [==============================] - 0s 17ms/step - loss: 0.3083 - accuracy: 0.8599 - auc: 0.8703 - val_loss: 0.4634 - val_accuracy: 0.8440 - val_auc: 0.8266
Epoch 10/10
1/18 [>.............................] - ETA: 0s - loss: 0.2786 - accuracy: 0.8750 - auc: 0.9662
5/18 [=======>......................] - ETA: 0s - loss: 0.2681 - accuracy: 0.8562 - auc: 0.9279
10/18 [===============>..............] - ETA: 0s - loss: 0.2937 - accuracy: 0.8406 - auc: 0.8942
15/18 [========================>.....] - ETA: 0s - loss: 0.2820 - accuracy: 0.8542 - auc: 0.8904
18/18 [==============================] - 0s 13ms/step - loss: 0.2950 - accuracy: 0.8475 - auc: 0.8824
18/18 [==============================] - 0s 17ms/step - loss: 0.2950 - accuracy: 0.8475 - auc: 0.8824 - val_loss: 0.4490 - val_accuracy: 0.8298 - val_auc: 0.8040
model %>%
evaluate(x = test$text, y = as.matrix(test[,"r1_declined"]))
1/6 [====>.........................] - ETA: 0s - loss: 0.2658 - accuracy: 0.8438 - auc: 0.8391
5/6 [========================>.....] - ETA: 0s - loss: 0.3326 - accuracy: 0.8313 - auc: 0.8430
6/6 [==============================] - 0s 12ms/step - loss: 0.3363 - accuracy: 0.8249 - auc: 0.8411
6/6 [==============================] - 0s 12ms/step - loss: 0.3363 - accuracy: 0.8249 - auc: 0.8411
loss accuracy auc
0.3363103 0.8248588 0.8411111
AUC greater than ~0.8 is sort of a arbitrary industry benchmark for passable prediction models which we’re in the ballpark of.
plot(history)
We can see that the model stabilized after around 3 epochs suggesting more training would not help improve model performance.
preds <- test %>%
bind_cols(tibble(pred_r1_declined = predict(model, test$text))) %>%
inner_join(submissions, by = "airtable_record_id")
preds %>%
mutate(r1_declined = as.factor(r1_declined)) %>%
ggplot(aes(pred_r1_declined, color = r1_declined)) +
geom_density()
There’s a large mass in the high range for declined proposals, while those that are not declined are more spread out through the distribution, with a mode around 0.4.
Overall we’re seeing higher declination scores for proposals labeled as declined, so that’s a good sign.
We came up with two scenarios earlier that motivated our thinking for how we could use these predictions. The first was motivated by a lack of reviewer resources such that we would prioritize proposals with more promise:
preds %>%
select(Created, pred_r1_declined, r1_declined) %>%
mutate(total_accepted = sum(r1_declined == 0)) %>%
# pred order
arrange(pred_r1_declined) %>%
mutate(total_read__pred = row_number(),
total_accepted__pred = sum(r1_declined == 0),
cum_accepted__pred = total_read__pred - cumsum(r1_declined == 1),
missed_accepted__pred = total_accepted - cum_accepted__pred) %>%
# created order
arrange(Created) %>%
mutate(total_read__created = row_number(),
cum_accepted__created = total_read__created - cumsum(r1_declined == 1),
missed_accepted__created = total_accepted - cum_accepted__created) %>%
pivot_longer(starts_with(c("total_read", "missed_accepted")),
names_sep = "__", names_to = c("name", "method")) %>%
pivot_wider() %>%
ggplot(aes(total_read, missed_accepted, color = method)) +
geom_line()
NA
After about 110 proposals, all the remaining would have been declined so there would have been no missed accepted proposals while reading them in order of submission time would have led to 17 missed proposals at that cutoff.
Let’s write these predictions back to the airtable for human scrutiny. Note that we are producing predictions for all submission, even those that did not have a declined marker.
update_records(at_update, submissions_at, columns = grants2vec)
You are about to update 933 records of the following variables:
grants2vec
Do you wish to proceed?
1: ✔ Yes
2: ✖ No
1
Sending PATCH requests [>--------------------------------------------------------------------] 2% eta: 34s
Sending PATCH requests [=>-------------------------------------------------------------------] 3% eta: 47s
Sending PATCH requests [==>------------------------------------------------------------------] 4% eta: 1m
Sending PATCH requests [===>-----------------------------------------------------------------] 5% eta: 1m
Sending PATCH requests [===>-----------------------------------------------------------------] 6% eta: 48s
Sending PATCH requests [====>----------------------------------------------------------------] 7% eta: 1m
Sending PATCH requests [=====>---------------------------------------------------------------] 9% eta: 1m
Sending PATCH requests [======>--------------------------------------------------------------] 10% eta: 1m
Sending PATCH requests [======>--------------------------------------------------------------] 11% eta: 48s
Sending PATCH requests [=======>-------------------------------------------------------------] 12% eta: 47s
Sending PATCH requests [========>------------------------------------------------------------] 13% eta: 46s
Sending PATCH requests [=========>-----------------------------------------------------------] 14% eta: 44s
Sending PATCH requests [=========>-----------------------------------------------------------] 15% eta: 43s
Sending PATCH requests [==========>----------------------------------------------------------] 16% eta: 42s
Sending PATCH requests [===========>---------------------------------------------------------] 17% eta: 41s
Sending PATCH requests [===========>---------------------------------------------------------] 18% eta: 41s
Sending PATCH requests [============>--------------------------------------------------------] 19% eta: 40s
Sending PATCH requests [=============>-------------------------------------------------------] 20% eta: 39s
Sending PATCH requests [==============>------------------------------------------------------] 21% eta: 38s
Sending PATCH requests [==============>------------------------------------------------------] 22% eta: 37s
Sending PATCH requests [===============>-----------------------------------------------------] 23% eta: 37s
Sending PATCH requests [================>----------------------------------------------------] 24% eta: 36s
Sending PATCH requests [=================>---------------------------------------------------] 26% eta: 35s
Sending PATCH requests [=================>---------------------------------------------------] 27% eta: 35s
Sending PATCH requests [==================>--------------------------------------------------] 28% eta: 34s
Sending PATCH requests [===================>-------------------------------------------------] 29% eta: 34s
Sending PATCH requests [====================>------------------------------------------------] 30% eta: 34s
Sending PATCH requests [====================>------------------------------------------------] 31% eta: 33s
Sending PATCH requests [=====================>-----------------------------------------------] 32% eta: 33s
Sending PATCH requests [======================>----------------------------------------------] 33% eta: 32s
Sending PATCH requests [======================>----------------------------------------------] 34% eta: 32s
Sending PATCH requests [=======================>---------------------------------------------] 35% eta: 31s
Sending PATCH requests [========================>--------------------------------------------] 36% eta: 30s
Sending PATCH requests [=========================>-------------------------------------------] 37% eta: 30s
Sending PATCH requests [=========================>-------------------------------------------] 38% eta: 29s
Sending PATCH requests [==========================>------------------------------------------] 39% eta: 29s
Sending PATCH requests [===========================>-----------------------------------------] 40% eta: 28s
Sending PATCH requests [============================>----------------------------------------] 41% eta: 28s
Sending PATCH requests [============================>----------------------------------------] 43% eta: 27s
Sending PATCH requests [=============================>---------------------------------------] 44% eta: 27s
Sending PATCH requests [==============================>--------------------------------------] 45% eta: 26s
Sending PATCH requests [===============================>-------------------------------------] 46% eta: 26s
Sending PATCH requests [===============================>-------------------------------------] 47% eta: 25s
Sending PATCH requests [================================>------------------------------------] 48% eta: 25s
Sending PATCH requests [=================================>-----------------------------------] 49% eta: 24s
Sending PATCH requests [=================================>-----------------------------------] 50% eta: 24s
Sending PATCH requests [==================================>----------------------------------] 51% eta: 23s
Sending PATCH requests [===================================>---------------------------------] 52% eta: 22s
Sending PATCH requests [====================================>--------------------------------] 53% eta: 22s
Sending PATCH requests [====================================>--------------------------------] 54% eta: 21s
Sending PATCH requests [=====================================>-------------------------------] 55% eta: 21s
Sending PATCH requests [======================================>------------------------------] 56% eta: 20s
Sending PATCH requests [=======================================>-----------------------------] 57% eta: 20s
Sending PATCH requests [=======================================>-----------------------------] 59% eta: 19s
Sending PATCH requests [========================================>----------------------------] 60% eta: 19s
Sending PATCH requests [=========================================>---------------------------] 61% eta: 18s
Sending PATCH requests [==========================================>--------------------------] 62% eta: 18s
Sending PATCH requests [==========================================>--------------------------] 63% eta: 17s
Sending PATCH requests [===========================================>-------------------------] 64% eta: 17s
Sending PATCH requests [============================================>------------------------] 65% eta: 16s
Sending PATCH requests [=============================================>-----------------------] 66% eta: 16s
Sending PATCH requests [=============================================>-----------------------] 67% eta: 15s
Sending PATCH requests [==============================================>----------------------] 68% eta: 15s
Sending PATCH requests [===============================================>---------------------] 69% eta: 14s
Sending PATCH requests [===============================================>---------------------] 70% eta: 14s
Sending PATCH requests [================================================>--------------------] 71% eta: 13s
Sending PATCH requests [=================================================>-------------------] 72% eta: 13s
Sending PATCH requests [==================================================>------------------] 73% eta: 12s
Sending PATCH requests [==================================================>------------------] 74% eta: 12s
Sending PATCH requests [===================================================>-----------------] 76% eta: 11s
Sending PATCH requests [====================================================>----------------] 77% eta: 11s
Sending PATCH requests [=====================================================>---------------] 78% eta: 10s
Sending PATCH requests [=====================================================>---------------] 79% eta: 10s
Sending PATCH requests [======================================================>--------------] 80% eta: 9s
Sending PATCH requests [=======================================================>-------------] 81% eta: 9s
Sending PATCH requests [========================================================>------------] 82% eta: 8s
Sending PATCH requests [========================================================>------------] 83% eta: 8s
Sending PATCH requests [=========================================================>-----------] 84% eta: 7s
Sending PATCH requests [==========================================================>----------] 85% eta: 7s
Sending PATCH requests [==========================================================>----------] 86% eta: 6s
Sending PATCH requests [===========================================================>---------] 87% eta: 6s
Sending PATCH requests [============================================================>--------] 88% eta: 5s
Sending PATCH requests [=============================================================>-------] 89% eta: 5s
Sending PATCH requests [=============================================================>-------] 90% eta: 4s
Sending PATCH requests [==============================================================>------] 91% eta: 4s
Sending PATCH requests [===============================================================>-----] 93% eta: 3s
Sending PATCH requests [================================================================>----] 94% eta: 3s
Sending PATCH requests [================================================================>----] 95% eta: 2s
Sending PATCH requests [=================================================================>---] 96% eta: 2s
Sending PATCH requests [==================================================================>--] 97% eta: 1s
Sending PATCH requests [===================================================================>-] 98% eta: 1s
Sending PATCH requests [===================================================================>-] 99% eta: 0s
Sending PATCH requests [=====================================================================] 100% eta: 0s
Let’s see how well these predictions performed against various
outcomes measures in our submissions airtable. First we re-read our
submissions with the newly updated grants2vec
declination
score:
submissions <- submissions_at %>% read_airtable()
We replicate the plots above demonstrating our two use cases for these scores.
The first strategy recommends ordering the reads by scores in event of limited reviewer resources or declining reviewer quality:
submissions %>%
unnest(Call) %>%
select(Created, Call, pred_r1_declined = grants2vec, `Round 1 Outcome`) %>%
mutate(r1_declined = `Round 1 Outcome` == "Decline") %>%
filter(!is.na(r1_declined)) %>%
group_by(Call) %>%
mutate(total_accepted = sum(r1_declined == 0)) %>%
# pred order
arrange(pred_r1_declined) %>%
mutate(total_read__pred = row_number(),
total_accepted__pred = sum(r1_declined == 0),
cum_accepted__pred = total_read__pred - cumsum(r1_declined == 1),
missed_accepted__pred = total_accepted - cum_accepted__pred) %>%
# created order
arrange(Created) %>%
mutate(total_read__created = row_number(),
cum_accepted__created = total_read__created - cumsum(r1_declined == 1),
missed_accepted__created = total_accepted - cum_accepted__created) %>%
pivot_longer(starts_with(c("total_read", "missed_accepted")),
names_sep = "__", names_to = c("name", "method")) %>%
pivot_wider() %>%
ggplot(aes(total_read, missed_accepted, color = method)) +
geom_line() + facet_wrap(~ Call, scales = "free") +
theme(legend.position = "bottom")
The blue curve represents the total number of
missed_accepted
proposals if the reviewer stopped reading
after total_read
proposals which is consistently under the
strategy that involves reading the proposals in order that they were
received. The gains are the distance between the two curves.
The second use case involves highlighting some proposals that had a very low declination score but ended up getting declined. This can be used as a second pass for proposals we may have missed.
(redacted)
Any other ideas of how you want to see this data analyzed? Some ideas below:
We have a qualitative understanding of how quality each of the different submissions were for each call. Let’s see if this is reflected in the score:
submissions %>%
unnest(Call) %>%
group_by(Call) %>%
summarise(decline_prob = mean(grants2vec, na.rm = TRUE)) %>%
arrange(decline_prob)
It appears Mental Health and Digital Coaching had the best proposals while Livelihood Exploration may have had the worst. The other calls were somewhere in between.
submissions %>%
select(matches("^R1.*[1-4]$"), grants2vec) %>%
pivot_longer(-grants2vec) %>%
filter(str_starts(value, "Tier")) %>%
mutate(value = str_remove(value, " - .*$"),
name = str_remove(name, "R1-")) %>%
group_by(name, value) %>%
summarise(decline_score = mean(grants2vec)) %>%
ggplot(aes(name, decline_score, fill = value)) +
geom_col(position = "dodge")
`summarise()` has grouped output by 'name'. You can override using the `.groups` argument.
NA
Only Reviewer3 had some inconsistent scoring between Tier 2 and Tier 3 with respect to the predicted score.
Great insights from Temina, follow ups below:
Which labels should we be using? We currently use “Round 1 outcome”, but as you’ve noted these proposals are reviewed quickly, by just 2 people per proposal, and the labels are noisy. Once we’ve had a chance to more thoroughly review the shortlist in Round 2, we have a set of proposals that is probably closer to the level of scrutiny I’d want in a training set. Within “Round 2 outcomes”, i think you could count anything that is not marked “decline” as a reasonably successful proposal. This means that you’ll end up with roughly 1/3 to 1/2 fewer labeled proposals. But the labels are more careful. If this loss of labels creates problems for training, perhaps we need to go through a few more rounds of grant-making?
Btw did you already try training on “R2 outcomes” or other outcome variables? Would be interesting to know whatever else you looked at, and how it performed.
ind_train <- sample.int(nrow(sub_text_r2), 0.8*nrow(sub_text_r2))
train <- sub_text_r2[ind_train,] %>%
filter(!is.na(r1_declined))
test <- sub_text_r2[-ind_train,] %>%
filter(!is.na(r1_declined))
Fewer positive labels here so much more imbalanced of a dataset.
history <- model %>%
fit(x = train$text,
y = as.matrix(train[, "r2_declined"]),
epochs = 10,
validation_split = 0.2)
Epoch 1/10
1/5 [=====>........................] - ETA: 0s - loss: 2.6673 - accuracy: 0.2812 - auc: 0.2885
5/5 [==============================] - 0s 15ms/step - loss: 1.7872 - accuracy: 0.6144 - auc: 0.4016
5/5 [==============================] - 0s 31ms/step - loss: 1.7872 - accuracy: 0.6144 - auc: 0.4016 - val_loss: 0.5214 - val_accuracy: 0.8462 - val_auc: 0.4040
Epoch 2/10
1/5 [=====>........................] - ETA: 0s - loss: 0.7989 - accuracy: 0.7500 - auc: 0.2057
5/5 [==============================] - 0s 15ms/step - loss: 0.6909 - accuracy: 0.8105 - auc: 0.3546
5/5 [==============================] - 0s 25ms/step - loss: 0.6909 - accuracy: 0.8105 - auc: 0.3546 - val_loss: 0.4545 - val_accuracy: 0.8462 - val_auc: 0.4949
Epoch 3/10
1/5 [=====>........................] - ETA: 0s - loss: 0.4038 - accuracy: 0.9062 - auc: 0.5575
5/5 [==============================] - 0s 15ms/step - loss: 0.5200 - accuracy: 0.8301 - auc: 0.3979
5/5 [==============================] - 0s 25ms/step - loss: 0.5200 - accuracy: 0.8301 - auc: 0.3979 - val_loss: 0.4388 - val_accuracy: 0.8462 - val_auc: 0.5505
Epoch 4/10
1/5 [=====>........................] - ETA: 0s - loss: 0.5896 - accuracy: 0.7500 - auc: 0.5260
5/5 [==============================] - 0s 14ms/step - loss: 0.4964 - accuracy: 0.8301 - auc: 0.5518
5/5 [==============================] - 0s 25ms/step - loss: 0.4964 - accuracy: 0.8301 - auc: 0.5518 - val_loss: 0.4494 - val_accuracy: 0.8462 - val_auc: 0.5455
Epoch 5/10
1/5 [=====>........................] - ETA: 0s - loss: 0.4642 - accuracy: 0.8125 - auc: 0.6603
5/5 [==============================] - 0s 13ms/step - loss: 0.4649 - accuracy: 0.8301 - auc: 0.6084
5/5 [==============================] - 0s 23ms/step - loss: 0.4649 - accuracy: 0.8301 - auc: 0.6084 - val_loss: 0.4135 - val_accuracy: 0.8462 - val_auc: 0.6414
Epoch 6/10
1/5 [=====>........................] - ETA: 0s - loss: 0.3853 - accuracy: 0.8438 - auc: 0.7778
5/5 [==============================] - 0s 15ms/step - loss: 0.4328 - accuracy: 0.8301 - auc: 0.6769
5/5 [==============================] - 0s 24ms/step - loss: 0.4328 - accuracy: 0.8301 - auc: 0.6769 - val_loss: 0.3890 - val_accuracy: 0.8462 - val_auc: 0.7727
Epoch 7/10
1/5 [=====>........................] - ETA: 0s - loss: 0.3704 - accuracy: 0.8750 - auc: 0.7321
5/5 [==============================] - 0s 15ms/step - loss: 0.3929 - accuracy: 0.8301 - auc: 0.8122
5/5 [==============================] - 0s 24ms/step - loss: 0.3929 - accuracy: 0.8301 - auc: 0.8122 - val_loss: 0.3635 - val_accuracy: 0.8462 - val_auc: 0.7980
Epoch 8/10
1/5 [=====>........................] - ETA: 0s - loss: 0.4235 - accuracy: 0.7812 - auc: 0.7340
5/5 [==============================] - 0s 13ms/step - loss: 0.3680 - accuracy: 0.8235 - auc: 0.8192
5/5 [==============================] - 0s 23ms/step - loss: 0.3680 - accuracy: 0.8235 - auc: 0.8192 - val_loss: 0.3416 - val_accuracy: 0.8462 - val_auc: 0.8283
Epoch 9/10
1/5 [=====>........................] - ETA: 0s - loss: 0.3768 - accuracy: 0.7812 - auc: 0.8914
5/5 [==============================] - 0s 14ms/step - loss: 0.3529 - accuracy: 0.8105 - auc: 0.8398
5/5 [==============================] - 0s 24ms/step - loss: 0.3529 - accuracy: 0.8105 - auc: 0.8398 - val_loss: 0.3195 - val_accuracy: 0.8462 - val_auc: 0.8409
Epoch 10/10
1/5 [=====>........................] - ETA: 0s - loss: 0.2397 - accuracy: 0.8750 - auc: 0.9704
5/5 [==============================] - 0s 14ms/step - loss: 0.3275 - accuracy: 0.8431 - auc: 0.8605
5/5 [==============================] - 0s 25ms/step - loss: 0.3275 - accuracy: 0.8431 - auc: 0.8605 - val_loss: 0.3271 - val_accuracy: 0.7949 - val_auc: 0.8914
model %>%
evaluate(x = test$text, y = as.matrix(test[,"r2_declined"]))
1/2 [==============>...............] - ETA: 0s - loss: 0.6253 - accuracy: 0.7188 - auc: 0.5111
2/2 [==============================] - 0s 10ms/step - loss: 0.6603 - accuracy: 0.6939 - auc: 0.4833
2/2 [==============================] - 0s 11ms/step - loss: 0.6603 - accuracy: 0.6939 - auc: 0.4833
loss accuracy auc
0.6603321 0.6938776 0.4833333
AUC is worse, in between a model of no value and the AUC values we were getting earlier.
sub_text_all$grants2vec_r2 <- model %>% predict(sub_text_all$text) %>% as.vector()
submissions %>%
left_join(sub_text_all, by = "airtable_record_id") %>%
unnest(Call) %>%
select(Created, Call, pred_r2_declined = grants2vec_r2,
ends_with("_declined")) %>%
filter(!is.na(r2_declined)) %>%
group_by(Call) %>%
mutate(total_accepted = sum(r2_declined == 1)) %>%
# pred order
arrange(pred_r2_declined) %>%
mutate(total_read__pred = row_number(),
total_accepted__pred = sum(r2_declined == 1),
cum_accepted__pred = total_read__pred - cumsum(r2_declined == 0),
missed_accepted__pred = total_accepted - cum_accepted__pred) %>%
# created order
arrange(Created) %>%
mutate(total_read__created = row_number(),
cum_accepted__created = total_read__created - cumsum(r2_declined == 0),
missed_accepted__created = total_accepted - cum_accepted__created) %>%
pivot_longer(starts_with(c("total_read", "missed_accepted")),
names_sep = "__", names_to = c("name", "method")) %>%
pivot_wider() %>%
ggplot(aes(total_read, missed_accepted, color = method)) +
geom_line() + facet_wrap(~ Call, scales = "free") +
theme(legend.position = "bottom")
Perhaps as reflected by the lower predictive scores, we see that these predictions don’t fare as well against reading in order of submission for many of the proposals, perhaps with the exception of the Open Call and Livelihood Exploration.
Below are the top 10 proposals that were declined but had the lowest
r2_declined
score.
(redacted)
And the next top 10 proposals that were declined in R2 with the
highest r2_declined
score.
(redacted)
A quick gut check that proposals declined in R1 have lower scores than those accepted in R1.
submissions %>%
left_join(sub_text_all, by = "airtable_record_id") %>%
group_by(r1_declined) %>%
summarise(across(starts_with("grants2vec"), mean))
new_submissions_at <- airtable("tblfNQfUsRCFAiIHd", "app1Cg2upg4u5RqTX")
new_submissions <- new_submissions_at %>% read_airtable()
new_sub_text_all <- new_submissions %>%
# turn lists with texts into a text column
# for anything else we just get the count
mutate(across(where(is.list), function(any_list) {
if (class(any_list[[1]]) == "character") {
map_chr(any_list, ~ str_c(.x, collapse = " "))
} else {
map_chr(any_list, ~ str_c(cur_column(), length(.x)))
}
})) %>%
select(airtable_record_id, ends_with("_declined"), where(is.character)) %>%
unite(text, -c(airtable_record_id, ends_with("_declined")), sep = " ")
new_sub_text_all$grants2vec <- model %>% predict(new_sub_text_all$text) %>% as.vector()
at_update <- new_sub_text_all %>% select(airtable_record_id, grants2vec) %>%
column_to_rownames("airtable_record_id")
update_records(at_update, new_submissions_at, columns = grants2vec)
You are about to update 162 records of the following variables:
grants2vec
Do you wish to proceed?
1: ✔ Yes
2: ✖ No
1
Sending PATCH requests [=======>-------------------------------------------------------------] 12% eta: 34s
Sending PATCH requests [===========>---------------------------------------------------------] 18% eta: 31s
Sending PATCH requests [===============>-----------------------------------------------------] 24% eta: 25s
Sending PATCH requests [===================>-------------------------------------------------] 29% eta: 22s
Sending PATCH requests [=======================>---------------------------------------------] 35% eta: 28s
Sending PATCH requests [===========================>-----------------------------------------] 41% eta: 25s
Sending PATCH requests [===============================>-------------------------------------] 47% eta: 24s
Sending PATCH requests [====================================>--------------------------------] 53% eta: 22s
Sending PATCH requests [========================================>----------------------------] 59% eta: 19s
Sending PATCH requests [============================================>------------------------] 65% eta: 15s
Sending PATCH requests [================================================>--------------------] 71% eta: 12s
Sending PATCH requests [====================================================>----------------] 76% eta: 10s
Sending PATCH requests [========================================================>------------] 82% eta: 7s
Sending PATCH requests [============================================================>--------] 88% eta: 4s
Sending PATCH requests [================================================================>----] 94% eta: 3s
Sending PATCH requests [=====================================================================] 100% eta: 0s