I was born in 1950. By every standard (except my own), I'm an old man. And, invariably, old men have accumulated a number of biases. In respect to this project, these are mine:
- I have a bias toward the Linux operating system.
- I have a bias toward open source.
- I have a bias toward human-readable, plain text data.
- I have a bias toward staying close to HTML and CSS.
- I have a bias toward K.I.S.S.
I look forward to accumulating additional universally-renowned biases.
Genetic genealogists who wish to report on any number of aspects of the data first need access to the data, not just pictures of the data. Indeed, the little I'm presenting here so far is openly available. I've merely reformatted it according to the above principles. More to come.
GENERIC Ybrowse.org DATA
This Y-SNP database was created from several files at isogg.org and ybrowse.org, including BYSNPindex.xlsx, FTSNPindex.xlsx, snps_hg38.vcf.gz, and others. I've reformatted them (my only contribution) into an easy-to-parse, space-delimited flat file having four fields: pos(ition), anc(estral value), der(ived value) and SNP names. Multiple names are separated by a comma with no following space. This is a sample line.
2789173 G C F4532,SK1916,Y525
Here's a very simple perl script for parsing that line of data. If you have perl installed on your computer, save the following to a file name of your choice and chmod 755.
#!/usr/bin/perl $data = "2789173 G C F4532,SK1916,Y525"; ($pos,$anc,$der,$list) = split / /,$data; @names = split /,/, $list; print "Position is $pos\n"; print "Ancestral data is $anc\n"; print "Derived value is $der\n"; print "SNP names: "; foreach $name (@names) { print "$name " } print "\n";It's likely that none other than a programmer will be interested in this. And coders working with Y-SNPs probably already have the data in one form or another. Nevertheless, here it sits. The file will unzip with the extension .db. Don't let that fool you. It's a simple, readable text file. Change the extension to anything you want — .txt, .csv, etc.: Download ZIP file.1
OPEN SOURCE Y-DNA PROJECT (Open Y)
The project's Facebook presence began during the summer of 2024. Work began on the database in earnest in March 2025 and started with combining the Y-DNA trees of FTDNA.com and YFull.com. But please note that both companies retain all rights to their data. I present here only the resultant database.
The purpose of the project is to provide a more complete Y chromosomal tree. For example, as of this writing, FTDNA has 91,000 branches or haplogroups in their Y tree. Open Y has 105,000. Because work isn't completed, that's expected to rise by another 5,000 or more. Likewise, the SNP count is increasing. And because merging isn't easy, standards have been adopted to ease the transition, standards that I hope will propagate throughout the community. They will be fully detailed in an upcoming article.
There are several advantages to the database. Although most SNP/haplogroup lineages back to the root of the tree are largely unchanged, due to the merge, a sizeable percentage have added haplogroups. Most have originated from FTDNA but many from YFull. And because of the added SNPs, etc, the timeline (once developed) will be more accurate.
Such merging will always create contradictions. Most have been worked out, but some remain and will join the Open Y database once fixed over the coming weeks.
• Get the Open Y path to the root for a terminal haplogroup, i.e R-YP4491.
OTHER TOOLS
These presently work only with the FTDNA database. A future upgrade will let you choose between the three databases, FTDNA, YFull, and Open Y.
• Search the SNP database for any named SNP or position. • Get a SNP's parent. • Retrieve ancestral value for any Y position:2 • Enter the lead SNP for a haplogroup to get a list of all subclades immediately below it. • Get full haplotree for any given terminal haplogroup. • Find the in-common haplogroup for several terminal haplogroups. • Compare haplotrees between two differing terminal groups.
SNP TREE BUILDER
I've been developing this code over the course of several years. As of 15 March 2025, it's open for testing and use by project administrators. It's a vailable now for alpha testing and presently produces HTML output. Testers are welcome to alter the output. For now, however, please leave the URL in the upper left so as to invite other visitors. Access will always be free and the code will eventually be available as open source. More development is first needed.
Please study the sample input and the notes before proceeding. Problems? Please contact me at michael.cooley@ysnp.info. — Click on SNP Tree Builder.
Feedback is needed and suggestions are welcome. However, before employing suggestions, bugs need to be found and fixed and then a rewrite is required for speed and efficiency.
BAM REPORTS
Except for the junkiest of reads, all SNP mutations are reported by quantity and quality and, if named, all variant names. Otherwise, they're marked as novel variants. This report can be used to check the validity of a SNP without the bother of clunky chromosome browsers and the learning curve sometimes required to understand them.
An example can be found at http://ysnp.info/public/1142-SNP-Report.txt. There's no phylogeny included. The arrangement of these markers into a tree is a separate study.
Because of restrictions at my content provider, I run BAMs on my personal computer. I'd be happy to do that for anyone interested in having such a report. Click on the Contact link at the top of http://ynsp.info and provide a link to your BAM at dropbox.com, etc. Please preserve the original file name.
Some caveats:
Again, send a message through the ysnp.info contact form and provide the link for the file hosting service to the zipped BAM. I'll download it and send a timely response.