I was born in 1950. By every standard (except my own), I'm an old man. And, invariably, old men have accumulated a number of biases. In respect to this project, these are mine:
- I have a bias toward the Linux operating system.
- I have a bias toward open source.
- I have a bias toward human-readable, plain text data.
- I have a bias toward staying close to HTML and CSS.
- I have a bias toward K.I.S.S.
I look forward to accumulating additional universally-renowned biases.
Genetic genealogists who wish to report on any number of aspects of the data first need access to the data, not just pictures of the data. Indeed, the little I'm presenting here so far is openly available. I've merely reformatted it according to the above principles. More to come.
GENERIC Ybrowse.org DATA
This Y-SNP database was created from several files at isogg.org and ybrowse.org, including BYSNPindex.xlsx, FTSNPindex.xlsx, snps_hg38.vcf.gz, and others. I've reformatted them (my only contribution) into an easy-to-parse, space-delimited flat file having four fields: pos(ition), anc(estral value), der(ived value) and SNP names. Multiple names are separated by a comma with no following space. This is a sample line.
2789173 G C F4532,SK1916,Y525
Here's a very simple perl script for parsing that line of data. If you have perl installed on your computer, save the following to a file name of your choice and chmod 755.
#!/usr/bin/perl $data = "2789173 G C F4532,SK1916,Y525"; ($pos,$anc,$der,$list) = split / /,$data; @names = split /,/, $list; print "Position is $pos\n"; print "Ancestral data is $anc\n"; print "Derived value is $der\n"; print "SNP names: "; foreach $name (@names) { print "$name " } print "\n";It's likely that none other than a programmer will be interested in this. And coders working with Y-SNPs probably already have the data in one form or another. Nevertheless, here it sits. Feel free to contact me if the file has gotten to be too old.
The file will unzip with the extension .db. Don't let that fool you. It's a simple, readable text file. Change the extension to anything you want — .txt, .csv, etc.: Download ZIP file.1
TOOLS
Get full haplotree for any given terminal haplogroup.
Find the in-common haplogroup for several terminal haplogroups.
Compare haplotrees between two differing terminal groups.
BAM REPORTS
Except for the junkiest of reads, all SNP mutations are reported by quantity and quality and, if named, all variant names. Otherwise, they're marked as novel variants. This report can be used to check the validity of a SNP without the bother of clunky chromosome browsers and the learning curve sometimes required to understand them.
An example can be found at http://ysnp.info/public/1142-SNP-Report.txt. There's no phylogeny included. The arrangement of these markers into a tree is a separate study.
Because of restrictions at my content provider, I run BAMs on my personal computer. I'd be happy to do that for anyone interested in having such a report. Click on the Contact link at the top of http://ynsp.info and provide a link to your BAM at dropbox.com, etc. Please preserve the original file name.
Some caveats:
- Again, no phylogeny is included in the reports.
- It generally takes 3 to 7 minutes to run each report. But I can't guarantee how quickly I'll get to it.
- Please submit only Y chromosome data.
- BAMs created only at FTDNA have been used so far. If yours fails, I hope we can work together to determine why.
- Bugs are likely present. For example, the very last entry in the sample shouldn't be there. STRs, because of their volatility, are eliminated from the report. But this one, from the infamous DYS19 region, snuck through.
- Recurrent SNPs are included. FTDNA removes most of these from their tree. Unless someone can convince me otherwise, I consider that action a disservice. Recurrent SNPs can form a downstream branch that might not otherwise be present.
Again, send a message through the ysnp.info contact form and provide the link for the file hosting service to the zipped BAM. I'll download it and send a timely response.
1 Size is [an error occurred while processing this directive] and will upzip to about three times.
2 The file can be downloaded from http://ybrowse.org/gbrowse2/gff/hg38ChrY.fa