A program for calculation of p-values for the ASP
(Affected sib pair) test based on
a manuscript by James Malley,
Daniel Naiman and Joan Bailey-Wilson can be freely
downloaded here.
Given a collection of m markers on a single chromosome,
with known inter-marker
physical distance, it is assumed that allele sharing
for each marker is measured for
a collection of p affected sib pairs (0=no sharing,
1=1 allele shared, 2=2 alleles shared) .
The total allele sharing y[i] at each marker
i is standardized to give a z-score z[i]=(y[i]-p)/sqrt(p/2)
at each marker. To test whether some marker
among the m of them is significantly linked to
the disease, we reject the null hypothesis of no linkage
if max {z[i],i=1,...,m} is sufficiently
large.
The program computes the p-value for this test using
naive monte carlo simulation,
and an importance sampling algorithm based on an algorithm
introduced in a paper by
Naiman and
Priebe (2001).The program takes the following as command inputs:
number of times to replicate the
monte carlo simulation (this can be taken to be 1)
monte carlo sample size
maximal z-score from the experiment
the number of markers in a fine
grid that the actual markers are to be embedded in
(best to
use a highly composite integer, e.g. a power of 2 for this)
actual number of markers used
seed for the monte carlo simulation
(a positive integer)
indicator of whether to use unequally
spaced markers given as standard input
or equispaced
markers
(0 means
use equispaced markers, 1 means specify markers in standard input)
If the option of specifying unequally spaced markers
is used (last argument = 1) then
the positions of the markers (in kbp from one end
of the chromosome) are given as standard input
to the program with one position per line. The actual
positions don't matter, what really matters
is the inter-marker distances (in kbp).
The reason for the first argument is that the user
might wish to compare naive simulation
to imporance sampling. The program repeats the monte
carlo experiment a prescribed
number of times and computes the relative efficiency
of importance sampling, which is
defined to be the ratio of computation times that
leads to importance sampling and naive
sampling giving the same variances.
Examples.
./asp 1 10000 2.50 64 6 27482 0
means run the program a single time using a sample
size of 10000. The maximal z-statistic
is taken to be 2.50. There are 6 equispaced markers
in a grid of 64. The seed is 27482.
Here is the output:
number of trials = 1
mc_sample_size = 10000
maximum z-statistic = 2.570000
Bonferroni p-value = 0.03050956
total number of markers in fine grid = 64
total number of markers used in test = 6
summary statistics
naive time = 0.77000 importance time =
0.87000
average naive p-value estimate =
0.0060000000
average importance p-value estimate =
0.0050849261
./asp 10 10000 2.50 64 6 27482 0
means do the above 10 times. Here is the output:
number of trials = 10
mc_sample_size = 10000
maximum z-statistic = 2.570000
Bonferroni p-value = 0.03050956
total number of markers in fine grid = 64
total number of markers used in test = 6
summary statistics
naive time = 7.84000 importance time =
9.22000
average naive p-value estimate =
0.0052100000
std dev naive p-value estimates =
11.5895262752
average importance p-value estimate =
0.0050849261
std dev importance p-value estimates =
11.5896581046
relative efficiency = 0.850
Finally, here is an example where the markers
are specified. A file "marker_list"
contains 6 lines as shown here:
5424
5733
5893
6217
6313
6710
giving 6 marker positions along the chromosome in kbp.
To input these positions to the program and do a single
run, we use
./asp 1 10000 2.50 64 6 27482 0 < marker_list
and the output looks like this:
number of trials = 1
mc_sample_size = 10000
maximum z-statistic = 2.570000
Bonferroni p-value = 0.03050956
total number of markers in fine grid = 64
total number of markers used in test = 6
summary statistics
naive time = 0.87000 importance time =
0.85000
average naive p-value estimate =
0.0093000000
average importance p-value estimate =
0.0080437429
The program runs in a linux x86 environment.
The binary executable can be
downloaded. click here for download
References
Malley, J., Naiman, D.Q., Bailey-Wilson,
J..
``A Comprehensive Method for Genome Scans''
(2002).
Submitted for publication.
Naiman, D.Q. and Priebe, C.E. ``Computing Scan Statistic
p-Values using Importance
Sampling with Applications to Genetics and Medical
Image Analysis'' (2001) Journal of
Computational and Graphical Statistics 10 296-328.