I was born in 1950. By every standard (except my own), I'm an
old man. And, invariably, old men have accumulated a number of biases. In
respect to this project, these are mine:
- I have a bias toward the Linux operating system.
- I have a bias toward open source.
- I have a bias toward human-readable, plain text data.
- I have a bias toward staying close to HTML and CSS.
- I have a bias toward K.I.S.S.
I look forward to accumulating additional universally-renowned biases.
Genetic genealogists who wish to report on any number of aspects of the
data first need access to the data, not just pictures of the data. Indeed,
the little I'm presenting here so far is openly available. I've merely
reformatted it according to the above principles. More to come.
GENERIC Ybrowse.org DATA
This Y-SNP database was created from several files at isogg.org
and ybrowse.org, including BYSNPindex.xlsx, FTSNPindex.xlsx,
snps_hg38.vcf.gz, and others. I've reformatted them (my only contribution)
into an easy-to-parse, space-delimited flat file having four fields:
pos(ition), anc(estral value), der(ived value) and SNP names. Multiple
names are separated by a comma with no following space. This is a sample
line.
2789173 G C F4532,SK1916,Y525
Here's a very simple perl script for parsing that line of data. If you
have perl installed on your computer, save the following to a file name of
your choice and chmod 755.
#!/usr/bin/perl
$data = "2789173 G C F4532,SK1916,Y525";
($pos,$anc,$der,$list) = split / /,$data;
@names = split /,/, $list;
print "Position is $pos\n";
print "Ancestral data is $anc\n";
print "Derived value is $der\n";
print "SNP names: ";
foreach $name (@names) { print "$name " }
print "\n";
It's likely that none other than a programmer will be interested in this.
And coders working with Y-SNPs probably already have the data in one form or
another. Nevertheless, here it sits. The file will unzip with the
extension .db. Don't let that fool you. It's a simple, readable text file.
Change the extension to anything you want — .txt, .csv, etc.: Download ZIP file.1
OPEN SOURCE Y-DNA PROJECT (Open Y)
The project's Facebook presence began during the summer of 2024. Work
began on the database in earnest in March 2025 and started with combining
the Y-DNA trees of FTDNA.com and YFull.com. But please note that both
companies retain all rights to their data. I present here only the
resultant database.
The purpose of the project is to provide a more complete Y chromosomal
tree. For example, as of this writing, FTDNA has 91,000 branches or
haplogroups in their tree. Open Y has 105,000. (See below for current
stats.) Because work isn't completed, that's expected to rise by another
5,000 or more. Likewise, the SNP count is increasing. And because merging
isn't easy, standards have been adopted to ease the transition, standards
that I hope will propagate throughout the community. They will be fully
detailed in an upcoming article.
Project Statistics 3 Feb 2026
The Open Y database combines the FTDNA and YFull databases.
Database | Number Haplogroups | Total SNPs | Recurrent SNPs
|
|
|
| | | |
| Open Y | 117,528 | 704,966 | Under Review
|
| FTDNA | 99,777 | 830,046 | 29,416
|
| YFull | 41,484 | 426,651 | 13,606
|
| TheYTree | 48,520 | 486,229 | 18,404
|
| YBrowse | n/a | 2,929,605 * | n/a
|
|
|
|
*Total number of registered SNPs
There are several advantages to the database. Although most
SNP/haplogroup lineages back to the root of the tree are largely unchanged,
due to the merge, a sizeable percentage have added haplogroups. Most have
originated from FTDNA but many from YFull. And because of the added SNPs,
etc, the timeline (once developed) will be more accurate.
Such merging will always create contradictions. Most have been worked
out, but some remain and will join the Open Y database once fixed over
the coming weeks.
| • | | Get the Open Y path to the root for
a terminal haplogroup, i.e R-YP4491.
|
OTHER TOOLS
These presently work only with the FTDNA database. A future upgrade will
let you choose between the three databases, FTDNA, YFull, and Open Y.
Registration for the upcoming Open Y Terminal Haplogroup Matching System
opened on December 12, 2025! There is presently a mere 9
members!
SNP TREE BUILDER
I've been developing this code over the course of several years. As of
15 March 2025, it's open for testing and use by project administrators. It's
a vailable now for alpha testing and presently produces HTML output. Testers
are welcome to alter the output. For now, however, please leave the URL in
the upper left so as to invite other visitors. Access will always be free
and the code will eventually be available as open source. More development
is first needed.
Please study the sample input and the notes before proceeding. Problems?
Please contact me at michael.cooley@ysnp.info. — Click on
SNP Tree Builder.
Feedback is needed and suggestions are welcome. However, before
employing suggestions, bugs need to be found and fixed and then a rewrite is
required for speed and efficiency.
BAM REPORTS
Except for the junkiest of reads, all SNP mutations are reported by
quantity and quality and, if named, all variant names. Otherwise, they're
marked as novel variants. This report can be used to check the validity of
a SNP without the bother of clunky chromosome browsers and the learning
curve sometimes required to understand them.
An example can be found at http://ysnp.info/public/1142-SNP-Report.txt.
There's no phylogeny included. The arrangement of these markers into a tree
is a separate study.
Because of restrictions at my content provider, I run BAMs on my personal
computer. I'd be happy to do that for anyone interested in having such a
report. Click on the Contact link at the top of http://ynsp.info and
provide a link to your BAM at dropbox.com, etc. Please preserve the original
file name.
Some caveats:
- Again, no phylogeny is included in the reports.
- It generally takes 3 to 7 minutes to run each report. But I can't
guarantee how quickly I'll get to it.
- Please submit only Y chromosome data.
- BAMs created only at FTDNA have been used so far. If yours fails, I
hope we can work together to determine why.
- Bugs are likely present. For example, the very last entry in the
sample shouldn't be there. STRs, because of their volatility, are
eliminated from the report. But this one, from the infamous DYS19 region,
snuck through.
- Recurrent SNPs are included. FTDNA removes most of these from their
tree. Unless someone can convince me otherwise, I consider that action a
disservice. Recurrent SNPs can form a downstream branch that might not
otherwise be present.
Again, send a message through the ysnp.info contact form and provide the
link for the file hosting service to the zipped BAM. I'll download it and
send a timely response.
1 Size is [an error occurred while processing this directive] and will upzip to about
three times.
2 The file can be downloaded from
http://ybrowse.org/gbrowse2/gff/hg38ChrY.fa