class: left, middle, inverse, title-slide # Update on correlates repository ### David Benkeser, PhD MPH
Emory University
Department of Biostatistics and Bioinformatics
--- <style type="text/css"> .remark-slide-content { font-size: 23px } </style> ## Outline - GitHub repo - AWS set up - Report building - Report output - Double coding - Next steps --- ## [CoVPN GitHub repository](https://github.com/CoVPN) <img src="fig/githubrepo.png" width="60%" style="display: block; margin: auto;" /> - `correlates_reporting` = .red[directory for analysis code] - `correlates_mockdata` = __directory for practice data__ - `CovidCorrSAP` = soon to be deprecated. --- ## [correlates_reporting repository](https://github.com/CoVPN/correlates_reporting) <img src="fig/reportingrepo.png" width="80%" style="display: block; margin: auto;" /> --- ## [correlates_reporting repository](https://github.com/CoVPN/correlates_reporting) __"Pre-processing"__ - `data_raw` = location where raw data should be deposited - `base_riskscore` = developing risk score using placebo-recipients __Immunogenicity report__ - `immuno_graphical` = graphical representations of immune assays - `immuno_tabular` = tabular descriptions of immune assays __Correlates of risk report (Tier 1A)__ - `cor_graphical` = graphical descriptions stratified by case status - `cor_coxph` = Cox model-based estimates of risk --- ## AWS build instructions [build_instructions.md](https://github.com/CoVPN/correlates_reporting/blob/master/build_instructions.md) <img src="fig/build_instructions.png" width="75%" style="display: block; margin: auto;" /> --- ## Build instructions The image build takes about 1 hour total. - Install `R` - Set up GitHub credentials and clone repository - Download `R` packages using `renv` After the `R` setup is complete, reports can be generated as follows. ```bash make data_processed # immunogenicity report # takes ~10 minutes make immuno_analysis make immuno_report # cor report # takes ~30 minutes make cor_analysis make cor_report ``` --- ## Compute To build these reports, there are a few computationally intensive steps. - Bootstrap, permutation tests, super learner for risk score The code is parallelized where possible. - Will work on any Linux AWS AMI; can adjust cores as required. - Could be built to work on Windows AMI as well. Somewhere around 50 vCPUs should be sufficient. - As low as 20 is likely do-able. --- ## Report output A `pdf` for each report is saved. Very rough drafts are available for browsing. - [immunogenicity report](https://www.dropbox.com/s/ykv4ij8xmch0zyk/covpn_correlates_immuno.pdf?dl=0) - [correlates of risk report](https://www.dropbox.com/s/50bep9g561oa7n3/covpn_correlates_cor.pdf?dl=0) Major infrastructure in place; time for smaller tweaks! --- ## Code verification All key analyses are undergoing __code verification__. __Each key analysis will include__: - specification document - exact steps needed to reproduce the analysis - independent analyst's code - written based on the specification document only - verification documentation - finding equivalence between results across both sets of code Code verification underway for each of the main analyses. --- ## Next steps - formatting tweaks for reports - nonparametric threshold methods in correlates report - improvements to build command chains