Supplementary Fig. 2 legend

Sequence-based map of the MSY.

This map provides a detailed view of the 24-Mb region shown in Fig. 1b.
Shown via background coloring (see Fig. 1) are the positions of three 
classes of euchromatic MSY sequences: X-transposed (pink), X-degenerate
(yellow), and ampliconic (blue), as well as heterochromatic MSY (red 
stripes) and pseudoautosomal (green) sequences and NORF arrays (grey stripes).
Two gaps in the sequence are indicated at the top edge of the diagram.  

a, Palindromes and near-perfect inverted repeats.  Eight primary 
palindromes (P1 through P8) and two secondary palindromes (P1.1 and P1.2)
are shown.  Diverging black arrows mark the left and right arms of each
palindrome.  Gaps between diverging arrows represent non-palindromic 
"spacers" at centers of these structures. Three near-perfect inverted 
repeats (non-palindromic, IR1 through IR3; see Supplementary Table 4).
In each case, the left and right arms exhibit >99.5% nucleotide identity.  

b, Other inverted repeats (non-palindromic, IR4 and IR5; see
Supplementary Table 4). The grey arrows, IR4, denote two regions of >93%
identity, one on Yp and one on Yq. The yellow arrows, IR5, denote four
regions of >92% identity, all on Yq.  

c, Recurrent deletions causing spermatogenic failure.  Deletions of 
any of the four indicated regions - AZFa , P5/proximal P1 (AZFb), AZFc,
or P5/distal P1 - have been observed to cause spermatogenic failure in
human populations (references to appropriate breakpoint papers).  

d, Protein-coding genes.  Previously reported genes and novel,
experimentally verified transcription units for which cDNA sequencing 
suggests protein-coding potential (Table 1).  Plus (+) strand above,
minus (-) strand below.  

e, CpG islands.  CpG islands, defined here as sequences >200 bp in 
length with G+C content >50%, CpG ratio (observed frequency/expected 
frequency) >0.6, and no detectable similarity to known repetitive sequences.  

f, G+C content (%) calculated in 100-kb sliding window with 1-kb steps.

g, Scale, in Mb.  

h, Non-coding transcription units.  Sequences whose transcription has
been verified (in this or previous studies) but for which there exists
little or no evidence of protein coding potential.  (Supplementary Table 2
provides information about these transcription units.)  

i, Landmark STSs. (Supplementary Table 11 provides GenBank accession
numbers.)  

j, STSs from Vollrath D, et al. Science 258, 52(1992).  (Supplementary
Table 11 provides GenBank accession numbers.)  

k, 220 BAC clones that have been completely or partially sequenced. 
Each bar represents size and position of one BAC clone, identified by the
numeric portion of its GenBank accession number (which in each case begins
with the prefix "AC").  Black bars represent finished sequences deposited
in GenBank, where finished sequences are trimmed to retain only 200 bp of
overlap with adjoining BACs.  Grey bars represent the "trimmings" of those
BACs, not deposited in GenBank.  Striped bars represent BACs whose sequence
has not been finished but has been deposited in GenBank.