NucEnerGen

Nucleosome energetics predictions based on high throughput sequencing

This website distributes software and data related to "High-throughput sequencing reveals a simple model of nucleosome energetics", Locke et al..

Download Experimental Results

Download mapped reads from the MNase control assay in eland and .wig (wiggle) formats. (Information on WIG files is available here.) Note that reads have been mapped to the SGD2 (Apr 2008 build) S. cerevisiae genome and Escherichia coli K12 MG1655 (U00096) E. coli genome.

Download Supplementary Data

Below are supplementary data from Locke et al.

SI Table 1 - Large correlation table comparing experimental data sets and models
SI Table 2 - Compares predicted energies for DNA-words made by N=2 position independent models fit on different datasets.

Download Our Predictions

Below are predictions on the S. cerevisiae genome from our models in .wig (wiggle) format compatible with Integrated Genome Browser. Predictions are made for the SGD2 (Apr 2008 build) S. cerevisiae genome and or WS170 C. elegans genome as noted. One model, marked "fit on C. elegans", was fit on in vivo nucleosome data from Valouev et al. Genome Res. 2008. Otherwise, all models were fit to in vitro nucleosome data from Zhang et al. Nat Struct Mol Biol. 2009. For a full description of our fitting procedures, see our paper.

Model	Binding energy	Binding probability	Occupancy
S. cerevisiae predictions
N=5 position-independent	link	link	link
N=2 position-independent	link	link	link
N=1 position-independent	link	link	link
N=2 spatially resolved	link	link	link
N=2 three-region	link	link	link
N=2 periodic	link	link	link
N=2 position-independent fit on C. elegans	link	link	link
C. elegans predictions
N=2 position-independent	link	link	link
N=1 position-independent	link	link	link
N=2 position-independent fit on C. elegans	link	link	link

Binding energy: The energy to bind a nucleosome to the genome at a given base-pair (bp).
Binding probability: The probability for a nucleosome to bind at a given bp.
Occupancy: The probability that a nucleosome will cover a given bp. (Occupancy at bp x is the sum of the probabilities from x-146 through x.)

Note: Probability and occupancy are calculated from binding energy using DynaPro.

Download Our Software

We provide software necessary to make a prediction of nucleosome occupancy on any DNA sequence. Instructions for installing and using this software follow.

Downloads

Software

Model data

S. cerevisiae genome

C. elegans genome

Downloads
Software
Model data
S. cerevisiae genome
C. elegans genome

To install:

Untar the software bundle into a directory, cd into that directory and type 'make'. This will produce the executable 'applyModel'. Download and untar the model data.

The code is ANSI C++, and the makefile calls g++. Other compilers and operating systems besides Linux have not been tested. The code requires only standard libraries (iostream, STL, etc.), so it should be portable.

To use:

The applyModel executable takes the following arguments:

-f

Fasta file containing the DNA sequence you wish to model.

If the fasta file you provide has more than one sequence in it, only the first sequence will be modelled. The links above provide the C. elegans and S. cerevisiae genomes with one fasta file per chromosome, but of course the model can be applied to whatever sequence you like. Note that sequences must be at least 147 bp long.

-eps

Model parameter file.

This argument allows you to select which model you wish to use to make predictions. You should specify one of the ".epsilon" files contained in the model data link above. These epsilon files specify: the N=2 and N=5 position independent fits on S. cerevisiae, the N=2 spatially resolved fit on S. cerevisiae, and the N=2 position independent fit on C. elegans.

-out

Name the energy output file.

-mod (optional)

Choose model

If you choose a position independent model you do not have to use this argument. If you use the spatially resolved model you must use "-mod 2". Note that the code will produce a bogus prediction if you use the spatially resolved epsilon file but fail to provide the "-mod 2" argument.

-strand (optional)

Choose forward strand, reverse strand or both.

Choose whether to model the nucleosomes as bound to the Watson or Crick side of the DNA. Our prodecdure produces models for which predictions on either strand correlate at 1.000, so the default is set to forward only.

So, a typical call will look like "./applyModel -f chr01.saccharomyces_cerevisiae-JAN-19-2007.fasta -eps position-independent_N2_Scerevisiae_in_vitro.epsilon -out posInd.chr01.en".

The output of this file is a text file with one column marking position in the sequence and another column showing the energy to bind a nucleosome at that position. Typical applications call not for energies but for occupancies, however. We use DynaPro, available here, to convert energy to occupancy. Installation and usage instructions are available on the Nucleosome Explorer website, but we offer one important tip for using it: DynaPro takes a "-bkgr" argument, and you should specify "-bkgr none".

Comments and questions regarding the materials distributed on this site may be directed to George Locke.

Back to Nucleosome Explorer++

Last updated July 2010