Jordan B. Burton, Ph.D.

Post-doctoral Research Fellow, Buck Institute for Research on Aging

Protstatmd: A NextFlow Containerized Analysis Pipeline for Spectral Count Proteomic Analysis Doubles the Number of Pairwise Comparisons between Beer Samples

Authors

Jordan B. Burton (Presenter)

Twitter Follow

Nicholas J. Carruthers

Paul M. Stemmer

Introduction

The default proteomicsLFQ Nextflow workflow uses area under the curve abundance and MSstats to make pairwise comparisons, treating unquantified proteins as missing values. In contrast spectral counting includes unquantified proteins allowing us to statistically assess proteins missing from one or more sample. Protstatmd was appended to the proteomicslfq workflow to facilitate installation of common R packages and produce interactive html documents using RMarkdown. Protstatmd performs statistical analysis of spectral count data enabling comparisons between beer types. Beer metaproteomic studies are used to evaluate effects of different yeast or hops strains, grains and brewing conditions on the beer proteome. We used label free quantification (LFQ) to compare three beers that are predicted to have widely divergent proteomes.

Methods

Samples were reduced with 5 mM DTT and alkylated with 15 mM IAA prior to tryptic digestion and analysis using nanoflow UHPLC with a 25 cm PepMap RSLC C18 column and a Fusion Orbitrap. Raw files were passed through the protstatmd appended proteomicsLFQ pipeline on the Wayne State High Performance Computing Grid for conversion, database searching, false discovery rate estimation and correction, and statistical analysis of protein abundance differences between groups. Data were searched against individual or combined yeast, barley, hops, wheat, and rice fasta databases with either Comet or msgf. Differential abundance analysis was performed using Nextflow with incorporated MSstats or EdgeR packages. The label-free data and differential abundance results are exported in html reports and tables.

Results

EdgeR allowed use of a greater percentage of the data than MSstats to determine the impact of different yeast or hops strains, grains, and brewing conditions on the proteome of a lager, imperial farmhouse ale and bourbon barrel aged beer. EdgeR doubles the number of pairwise comparisons for four of the five database searches compared to MSstats using the protstatmd appended proteomicsLFQ NextFlow workflow. Spectral counts were used to compare sample proteomes using the lager as a control. The msgf database search identified 6 - 30% more proteins than Comet for yeast, barley, hops, and wheat database searches. 254, 258, 92, 188, and 10 proteins were identified after searching the data against individual yeast, barley, hops, wheat, and rice databases with msgf. There were 169 or 49 differentially abundant yeast proteins (q-value < 0.1) between the bourbon barrel aged beer and the lager or the imperial farmhouse ale and the lager. There were 61 or 78 differentially abundant barley proteins (q-value < 0.1) between the bourbon barrel aged beer and the lager or the imperial farmhouse ale and the lager. No differentially abundant hops proteins were detected. There were 74 or 57 differentially abundant wheat proteins (q-value < 0.1) between the bourbon barrel aged beer and the lager or the imperial farmhouse ale and the lager. There were 5 or 1 differentially abundant rice proteins (q-value < 0.1) between the bourbon barrel aged beer and the lager or the imperial farmhouse ale and the lager. When searching a combined species database 316 proteins were identified, whereas 802 total proteins were identified by individual database searches. There were 121 or 49 differentially abundant proteins (q-value < 0.1) identified in the combined database search between the bourbon barrel aged beer and the lager or the imperial farmhouse ale and the lager.

Novel Aspect

The protstatmd container in the proteomicsLFQ workflow allowed EdgeR pairwise comparisons of spectral counts which increased the statistical search space.

Poster

To view the poster, click here.

To view the presentation click here.

Statistical Results using MS-GF+ with EdgeR

Barley

Wheat

Yeast

Rice

Hops

Not enough proteins were identified to make statistally relevant comparisons with EdgeR.