Twitter notes of SACMDA3
2019-04-21Several weeks ago,
the 3rd Workshop on Statistical and Algorithmic Challenges in Microbiome Data Analysis
was hosted at the
Simons Foundation.
I did not attend,
but with the magic of Twitter and the internet,
I could follow along with the generous efforts of live-tweeting with the
hashtag
#SACMDA3
.
Here are my notes on what happened.
First day
A simple but crucial question in the world of microbiome science.
Incredibly impressed by the morning talks at #SACMDA3#SACMDA. “What’s in my sample?” is still an open and important question — and it’s to hear from both new and senior researchers working on strain id & tax. assignment qtns! Great work, Nidhi, Chris, Rachel & Francesco!!
— Amy Willis (@AmyDWillis) April 1, 2019
Great thread, summarizing the first day of the workshop, along with links to papers, GitHub and Bitbucket code repositories, and comments.
Excited to be at #SACMDA3, third annual-ish Workshop on Statistical and Algorithmic Challenges in Microbiome Data Analysis, sponsored by @MITMicrobiome and @SimonsFdn @FlatironCCB https://t.co/Gy4owbvZtK
— Claire Duvallet (@cduvallet) April 1, 2019
A reminder of our role as computationalists.
Another lol moment (for me) here at #SACMDA3: "Goal: narrow down the search space for experimentalists." #BenevolentComputationalists
— Claire Duvallet (@cduvallet) April 1, 2019
“A big challenge in computational metagenomics: Lacking consensus about benchmarking datasets, evaluation procedures, and metrics complicates prior performance assessments.”
Echo that! #SACMDA3 pic.twitter.com/7vHXUmeunc
— Erika Ganda (@EKGanda) April 1, 2019
Second day
Another great thread by Clarie Duvallet for the second day.
Day 2 of #SACMDA3 kicked off by @KnightLabNews emphasizing the need to get message about appropriate statistical techniques for data analysis to the entire #microbiome community. Ends with: anyone here wanna write a review paper with him, @ejalm, and @RichBonneauNYU? 😉
— Claire Duvallet (@cduvallet) April 2, 2019
This blog post on Statistical Methods for Microbiome Data was mentioned, especially on the section for a novel method for contamination removal.
“Every metagenomic measurement is wrong.”
This public service announcement brought to you by @bejcal and @mikemc423 #SACMDA3 pic.twitter.com/Sw1nNyuKIk
— Amy Willis (@AmyDWillis) April 2, 2019
Challenges when working with Longitudinal Multidomain data
- Data Quality: Heterogeneity, unwanted sourcecs of variation.
- Building modles from the data.
- Interpretation of analytic output.
- Multiple (dual) dependencies
- Multidomain, need for registration.
- Uncertainty quantification and inference.
- Reproducibility of results across labs, experimental conditions and users.
Susan Holmes talking about the many challenges when working with data. #SACMDA3 pic.twitter.com/KaWcw9gDOV
— Erika Ganda (@EKGanda) April 2, 2019
Software packages mentioned
- CONCOCT
- Pandora
- “Pan-genome inference with long error-prone sequencing reads”
- In active development
- GitHub
- DESMAN
- “DESMAN identifies variants in core genes and uses co-occurrence across samples to link variants into haplotypes and abundance profiles.”
- This tool was not mentioned directly, but appears to be the predecessor to a new tool that was mentioned
- Paper
- PhyloPhlAn (Version 2)
- FEAST
- “Fast expectation maximization source tracking”
- GitHub
- rhapsody
- “Neural networks for microbe-metabolite interaction analysis”
- GitHub
- mixOmics
Papers
- Pasolli et al., “Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle”. Cell (2019)
- Wirbel et al., “Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer”. Nature Medicine (2019)
- Sczyrba et al.,”Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software”. Nature Methods (2017)
- McLaren, Willis, Callahan. “Consistent and correctable bias in metagenomic sequencing measurements” bioRxiv (2019)
- Fernandes et al., “A reproducible effect size is more useful than an irreproducible hypothesis test to analyze high throughput sequencing datasets” arXiv (2018)
- Washburne et al., “Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets”. PeerJ (2017)
- Washburne et al., “Phylofactorization: a graph-partitioning algorithm to identify phylogenetic scales of ecological data”. bioRxiv
- Rivera-Pinto et al., “Balances: a New Perspective for Microbiome Analysis”
- Silverman et al., “Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes”. arXiv (2019)
Reflection
There was a lot of concepts, papers, and code repositories I couldn’t cover. Feel free to browse the Twitter feed for more.
Things to share
Sorting lines in Vim
You can
sort lines in Vim
with :{range}sort u
.
The u
is to remove duplicate lines.
More can be found at :help sort
.
Spelling files created by Vim
For UTF-8 files, Vim creates two files:
en.utf-8.add
en.utf-8.add.spl
One discussion about them suggest one is plain text and the other is the corresponding compiled binary. I was editing my spelling files and was curiou about the difference between them.
The en.utf-8.add
file is plain text and I have version controlled it for
consistence across my computers.
The compiled .spl
file can be recreated using
:mkspell ~/.vim/spell/en.uft-8.add