Twitter notes of SACMDA32019-04-21
Several weeks ago,
the 3rd Workshop on Statistical and Algorithmic Challenges in Microbiome Data Analysis
was hosted at the
I did not attend,
but with the magic of Twitter and the internet,
I could follow along with the generous efforts of live-tweeting with the
Here are my notes on what happened.
A simple but crucial question in the world of microbiome science.
Incredibly impressed by the morning talks at #SACMDA3#SACMDA. “What’s in my sample?” is still an open and important question — and it’s to hear from both new and senior researchers working on strain id & tax. assignment qtns! Great work, Nidhi, Chris, Rachel & Francesco!!— Amy Willis (@AmyDWillis) April 1, 2019
Great thread, summarizing the first day of the workshop, along with links to papers, GitHub and Bitbucket code repositories, and comments.
Excited to be at #SACMDA3, third annual-ish Workshop on Statistical and Algorithmic Challenges in Microbiome Data Analysis, sponsored by @MITMicrobiome and @SimonsFdn @FlatironCCB https://t.co/Gy4owbvZtK— Claire Duvallet (@cduvallet) April 1, 2019
A reminder of our role as computationalists.
“A big challenge in computational metagenomics: Lacking consensus about benchmarking datasets, evaluation procedures, and metrics complicates prior performance assessments.”
Another great thread by Clarie Duvallet for the second day.
Day 2 of #SACMDA3 kicked off by @KnightLabNews emphasizing the need to get message about appropriate statistical techniques for data analysis to the entire #microbiome community. Ends with: anyone here wanna write a review paper with him, @ejalm, and @RichBonneauNYU? 😉— Claire Duvallet (@cduvallet) April 2, 2019
This blog post on Statistical Methods for Microbiome Data was mentioned, especially on the section for a novel method for contamination removal.
“Every metagenomic measurement is wrong.”
Challenges when working with Longitudinal Multidomain data
- Data Quality: Heterogeneity, unwanted sourcecs of variation.
- Building modles from the data.
- Interpretation of analytic output.
- Multiple (dual) dependencies
- Multidomain, need for registration.
- Uncertainty quantification and inference.
- Reproducibility of results across labs, experimental conditions and users.
Software packages mentioned
- “Pan-genome inference with long error-prone sequencing reads”
- In active development
- PhyloPhlAn (Version 2)
- “Fast expectation maximization source tracking”
- “Neural networks for microbe-metabolite interaction analysis”
- Pasolli et al., “Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle”. Cell (2019)
- Wirbel et al., “Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer”. Nature Medicine (2019)
- Sczyrba et al.,”Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software”. Nature Methods (2017)
- McLaren, Willis, Callahan. “Consistent and correctable bias in metagenomic sequencing measurements” bioRxiv (2019)
- Fernandes et al., “A reproducible effect size is more useful than an irreproducible hypothesis test to analyze high throughput sequencing datasets” arXiv (2018)
- Washburne et al., “Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets”. PeerJ (2017)
- Washburne et al., “Phylofactorization: a graph-partitioning algorithm to identify phylogenetic scales of ecological data”. bioRxiv
- Rivera-Pinto et al., “Balances: a New Perspective for Microbiome Analysis”
- Silverman et al., “Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes”. arXiv (2019)
There was a lot of concepts, papers, and code repositories I couldn’t cover. Feel free to browse the Twitter feed for more.
Things to share
Sorting lines in Vim
sort lines in Vim
u is to remove duplicate lines.
More can be found at
Spelling files created by Vim
For UTF-8 files, Vim creates two files:
One discussion about them suggest one is plain text and the other is the corresponding compiled binary. I was editing my spelling files and was curiou about the difference between them.
en.utf-8.add file is plain text and I have version controlled it for
consistence across my computers.
.spl file can be recreated using