Eric Leung Code and Data Learnings     about     blog     projects     misc     feed

Twitter notes of SACMDA3

Several weeks ago, the 3rd Workshop on Statistical and Algorithmic Challenges in Microbiome Data Analysis was hosted at the Simons Foundation. I did not attend, but with the magic of Twitter and the internet, I could follow along with the generous efforts of live-tweeting with the hashtag #SACMDA3.

Here are my notes on what happened.

First day

A simple but crucial question in the world of microbiome science.

Great thread, summarizing the first day of the workshop, along with links to papers, GitHub and Bitbucket code repositories, and comments.

A reminder of our role as computationalists.

A big challenge in computational metagenomics: Lacking consensus about benchmarking datasets, evaluation procedures, and metrics complicates prior performance assessments.”

Second day

Another great thread by Clarie Duvallet for the second day.

This blog post on Statistical Methods for Microbiome Data was mentioned, especially on the section for a novel method for contamination removal.

“Every metagenomic measurement is wrong.”

Challenges when working with Longitudinal Multidomain data

  • Data Quality: Heterogeneity, unwanted sourcecs of variation.
  • Building modles from the data.
  • Interpretation of analytic output.
  • Multiple (dual) dependencies
  • Multidomain, need for registration.
  • Uncertainty quantification and inference.
  • Reproducibility of results across labs, experimental conditions and users.

Software packages mentioned

    • “Unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads.”
    • Paper
    • GitHub
  • Pandora
    • “Pan-genome inference with long error-prone sequencing reads”
    • In active development
    • GitHub
    • “DESMAN identifies variants in core genes and uses co-occurrence across samples to link variants into haplotypes and abundance profiles.”
    • This tool was not mentioned directly, but appears to be the predecessor to a new tool that was mentioned
    • Paper
  • PhyloPhlAn (Version 2)
    • “PhyloPhlAn is a computational pipeline for reconstructing highly accurate and resolved phylogenetic trees based on whole-genome sequence information.”
    • Paper
    • Bitbucket
    • “Fast expectation maximization source tracking”
    • GitHub
  • rhapsody
    • “Neural networks for microbe-metabolite interaction analysis”
    • GitHub
  • mixOmics
    • “mixOmics: An R package for ‘omics feature selection and multiple data integration”
    • Website
    • Paper



There was a lot of concepts, papers, and code repositories I couldn’t cover. Feel free to browse the Twitter feed for more.

Things to share

Sorting lines in Vim

You can sort lines in Vim with :{range}sort u. The u is to remove duplicate lines. More can be found at :help sort.

Spelling files created by Vim

For UTF-8 files, Vim creates two files:

  • en.utf-8.add
  • en.utf-8.add.spl

One discussion about them suggest one is plain text and the other is the corresponding compiled binary. I was editing my spelling files and was curiou about the difference between them.

The en.utf-8.add file is plain text and I have version controlled it for consistence across my computers.

The compiled .spl file can be recreated using

:mkspell ~/.vim/spell/en.uft-8.add