<ysnp.info>

YFull Database Research

For the purpose of this project, I define a haplogroup class as that found before the hyphen in a haplogroup name. They're not all basal haplogroups as most are derived.

The YFull database has 34,223 haplogroups and 416,067 SNPs. The latter does not include INDELs. They will be studied later. Of those SNPs, 13,024 are recurrent in up to six haplogroups. This leaves 388,306 non-recurrent SNPs. At somepoint in the near future, all recurrent SNPs will be marked as such in the databases.

Each SNP's haplogroups are sorted by class, as defined above. Of those, 22,842 are singly recurrent in separate classes. In other words, they are recurrent in totally different branches of the global tree. That's a totally reasonable situation. I've reclassified them as non-recurrent. However, there are 0 additional SNPs that are presently recurrent in path. These are SNPs that appear in both the child and parent haplogroups. They can be fixed programmatically.

There's presently a total of 411,148 SNPs for which no further study is required.

We end up with 2,022 recurrent SNPs that need to be studied. I've yet to complete a successful algorithm. While doing this work, though, I have found recurrent SNPs that share an MRCA with one another within about only three or four degrees upstream. Feel free to look at them. Are any of them in path?

MORE