What is DNA Overlap on GEDmatch?

Video Transcription

(00:00):
If you haven’t logged onto GEDmatch recently, then one thing you might see when you’re looking at your match list is overlap. And what is overlap? Why is it important and why is there so much confusion? We’re gonna try to tackle that today.

(00:21):
Howdy, I’m Andy Lee with Family History Fanatics and this is a segment of DNA. Be sure to subscribe to our channel and click on the bell if you’d like to be notified about upcoming episodes. Now, overlap has caused a lot of confusion among people and it was just started to be included in the match list when the GEDmatch Genesis program came out. Now, since that time, Genesis has been merged over to the regular GEDmatch website and so overlap is here to stay and understanding. It might help you understand why you’re seeing some of your matches and why maybe you’re not seeing some of your matches. So let’s get into it. To begin, DNA companies test your DNA and they look at somewhere between 500,000 and 900,000 individual locations out of 3 billion. Now, between different companies, they each select which ones they want to choose.

(01:12):
And because of this, some of those sites are the same sites, and this is what we call overlap. So for instance, family Tree DNA might overlap with 23andme in about 200,000 of those locations. So it might be that 23 and Me has 300,000 locations that are different from anything that Family Tree DNA has. And Family Tree DNA might have 300,000 locations that are different from anything that 23 and me has. Now because there are thresholds in matching, you have to match a certain number of centimorgans, but you also have to match a certain number of snips. That’s those these individual locations that are being tested. The amount of overlap is important because while both 23 and Me and Family Treatment DNA each test about 500,000 locations, it’s only the same locations that they can look at. It’s only that overlap that can be looked at.

(02:08):
So for instance, instead of 500,000 locations that are spread out along those, there’s really only 200,000 where they both test the exact same snips. In other words, the overlap that they test that you’re comparing. What this means for segments is as you get into smaller and smaller segments, there’s less and less snips and at some point you may not have enough snips to meet that threshold, so you may lose a segment that really is a match, but just because of where the different people tested, you’re not seeing that. Let me put up a little chart here to help explain it more. And this is a list of the different testing companies and their different chip version. Each time they change a chip version, they change which of those snips they’re including on each one of those tests. Now I’m just using 10 because the screen’s not big enough for 700,000.

(03:03):
So we’re just gonna be looking at 10 and you can see, hey, ancestry with their version one, they tested these ones of the snips and then their version two tested a little bit different. Same with 23 and Me, same with my Heritage Family Tree, DNA and Living DNA. So next, let me highlight a couple of tests. So let’s look at the ancestry test and the 23andme test. Their most recent versions, you can see that ancestry, they test on 2, 3, 5, 6, 9 and 10, 23 and me they test on 2, 4, 5, 7, 9 and that’s it. And so the overlap between these two is really only positions two positions five and positions nine. Now if we look at a different test, if we look at Family Tree DNA in comparison to that ancestry test, we see that family Tree DNA, they test on 2, 3, 4, 7, 8, and 10, but the overlap with ancestry is only two, three, and 10.

(04:02):
So because of that, there’s a lot of these gaps in the information and if those gaps get big enough, then you might start to get, or you might start to lose matches that are really should be there, but they’re not being there because of the thresholds. What does the overlap table then look like if we are comparing each one of the different tests? Now the ISOG website, they have a table that shows you how much overlap there is between the tests. And you can see on this table and you can see in the link below how the different tests overlap. And in most cases they overlap by at least a couple hundred thousand and sometimes as much as you know, 600,000 snips. But the Jet match overlap is a little bit different. So based on the test that I’ve taken with all the different companies at different times, I created my own table of overlap.

(04:58):
And you can quickly see that the overlap when the GEDMatch database is not the same as the overlap that’s listed on is og. Now why is this? Well, GEDMatch doesn’t look at every single snip that is tested. GMA started back in 2010 and at that time Family Tree DNA and 23 and me were both offering tests. It was the 23 and me version three chip. Now the 23 and me version three chip had um, just over 900,000 snips on it. And you can see based on the overlap that GEDmatch is actually looking at the vast majority of those almost 850,000 of those with the Family Tree DNA chip, they had about 700,000 snips on it and GEDMatch decided to look at the majority of those, also about 620,000. And most of those overlapped with ones from 23andme. So there’s really only a few thousand of the Family Tree DNA snips Extra that they added in.

(05:55):
Now why they didn’t go with the entire group of 23andme and Family Tree DNA snips, I don’t know, but they still got the majority of them. Now as time went on, 23andme changed their chips, ancestry added their chip, my Heritage added a DNA test, but Dares was using the Family Tree DNA chips. So there really wasn’t much change there. And then we also have Living DNA that added their chip as well as some other companies. Now, because of the way that GEDmatch is storing their data, my guess is is that they didn’t want to necessarily expand the overall snip that they were looking at because that might then invalidate some of the past results. And so they kept this same base of about 870,000 s snips. I know that that’s really what the limit is, is because I’ve created a super kit, a GEDmatch super kit using their tool and using their tool.

(06:50):
It comes up with 870,192 snips and that is using all of the tests that I’ve taken from all the different companies. Now I’ve created my own super kit I showed you in a video before and that kit has over 1.5 million snips in it. And those are all individual snips. So there is, you know, 600,000 of those snips that Jed Max is just ignoring. So they’ve got a specific selection of those snips and it’s really based on that version three of the 23 and Me Chip and the original Family Tree DNA chip. Now what this means when it comes to comparisons is that if you have a 23 and me version three chip or a Family Tree DNA chip or am my Heritage Chip, then you are going to get the best matching results when you’re matching with other 23 and Me Family Tree DNA and my Heritage chips.

(07:43):
Now the results with the Ancestry chip are still pretty good because it still covers about 400,000 of the common snips between them. But once you get into the 23 and Me version five chip, the latest version they have, and if you’re using the Living DNA chip, then what you’re gonna see is we’re starting to get into much less DNA that’s being compared and that’s where this overlap might be a problem. Now GEDmatch points this out when you’re looking at the columns on your match list. If the overlap column is highlighted in various shades of red or pink, that just means that the overlap is really small and so the results may not be very reliable in that case, the major downside is, is that you’re going to miss out on some matches that you wouldn’t have otherwise. So one reason that I encourage everybody to test everywhere is because the information you get is different.

(08:39):
Now when you’re uploading that information to GEDMatch, then there really is a priority as far as what the best information to upload is. So for instance, the 23 me version three chip, that’s the best information that you can upload. The My Heritage or Family Tree DNA chip, that is the second best information that you can upload. Then there is the Ancestry DNA chips, that’s the third best information to upload. And finally you have the other 23 and ME chips and the Living DNA chip. So if you haven’t tested at Ancestry Family Tree DNA or My Heritage, I encourage you to test at one of those places and make sure you upload that data to GEDmatch and use that to help in your matching results. So next I wanted to show you this table. And this is a table that basically combines that information from the other table.

(09:27):
I know that there is a maximum threshold of around 870,000, which is the maximum number of snips based on my super kit that GEDmatch is looking at. And this table is showing the percentage of what each one of those chip combinations has and how much you’re actually going to be matching with. So for instance, if you are matching a 23 and me version four kit to a 23 and me version five kit, you’re only using 11% of overall snips that GEDmatch looks at. And so there’s gonna be a lot of information that you miss. On the other hand, if you have a 23 me version four kit and you’re comparing it to an Ancestry, uh, version two kit, you get about 33% of the overall snips. So it’s three times the amount of information that you’re looking at between those two kits. Now, like I said, the best kit is going to be the 23 and me version three kit, which if you’re comparing it to another 23 and me version three kit, 97% of all the snips that they’re looking at are included in there.

(10:29):
And so there’s very few matches that you’re going to be missing out on because you’re not missing out on the very much information. Now, if there was a way for GEDmatch to retrofit their website and be able to bring in all this other snip data from these new chips, then comparing a living DNA chip to a 23 and me chip would be as accurate as comparing an ancestry chip to a ancestry chip. You’d still have some issues when you’re comparing, you know, a 23andme chip to an ancestry chip, but those issues would be minimized because we’d be looking at a whole lot more snips. Overall, there is nothing that you can do to overcome the problem of low overlap. The only thing that can be done is for people to test with other companies and upload the results to GEDmatch. So when you’re looking at your match list, if you’re comparing it to one of the match lists on one of the other websites and you’re not seeing some of the same people who you know have uploaded to GEDmatch, it might just be that there’s not enough overlap in the snips to be able to actually identify that match.

(11:37):
Of course, it’s always important to remember that each website has their own algorithms that they use in order to determine what a match is. And so there might be some cases where there’s plenty of overlap, but one site, just their algorithm doesn’t call that a match, but another site does. Ideally you’re gonna find multiple matches that go back to a common ancestor and in this case, overlaps not such of a big deal because you have multiple matches that are all leading back in the same line. So that’s what you should focus on if you’re dealing with just one match, make sure you’re paying attention to whether or not you have good overlap, whether that box is highlighted in red or not. And if you have any questions about overlap that you would like me to answer, put it in the comments below. If you like this video, give it a thumbs up and make sure you share it with all your friends.