Workpaper on autosomal test estimates of ancestral admixture

Findings to date

Autosomal testing is useful for estimating my ancestral admixture at the continental level and where there are people in the world with whom I share DNA. Even so, I learned companies differ in:

1. Genealogical timeframes used
2. Reference populations sampled
3. Disproportionate ancestry sampling
4. Accuracy of estimating ancestry
5. Delineating populations within a continent.


1. Genealogical ancestral timeframes.

Companies come at ancestry timeframes from different perspectives. For the most part, 23andme and Ancestry autosomal tests are oriented on recent genealogical timeframes, around 500 years ago. The company, 23andMe, analyzes what it refers to as my "Ancestry Composition." It is the percent of my ancestry that comes from each parent DNA based on where their ancestors lived some 500 years ago, as reflected by its reference population groups. Along these lines, the Ancestry company provides me with an "Ethnicity Estimate. It compares my DNA to that of people with similar DNA and with extended family histories going back several hundred to over 1,000 years in a particular world region.

In contrast, the XCode.Life company strives to give ancestral information deeper than 500 years. This giving geographical focus to deep ancestry is also the goal of the Family Tree company's ancestry estimations in its autosomal test, "My Origins."

Yet, regarding deep ancestry, geneticist, Razib Khan points out "Europeans, as we understand Europeans today, genetically did not exist 10,000 years ago and South Asians as we understand them did not exist 10,000 years ago." Expressing a consensus among scientists with this view, Ker Than adds in his article in Science on popular DNA testing, "such tests cannot account for recent migrations of peoples from their ancient homelands. Present-day patterns of residence are rarely identical to what existed in the past, and social groups have changed over time, in name and composition."

Commenting on these views, noted geneticist Dr. Adam Rutherford provides even further support in Scientific American, "For deeper family roots, these tests do not really tell you where your ancestors came from. They say where DNA like yours can be found on Earth today."

That said, anthropological geneticist, Dr. Deborah Bolnick suggests, "If a test-taker is just interested in finding out where there are some people in the world that share the same DNA as them, then these tests can certainly tell them that."

2. Reference population ancestry sampling

The estimate of ancestry refers to a type-population in a given geographical location for any single individual. But ancestry estimates differ because the sampling reference populations used by each company differ. Each company relies on its proprietary database for DNA reference sampling, and there are no established guidelines for the representation of ancestry.

3. Disproportionate ancestry sampling.

Test populations proportion to global population distribution are disproportionate. People of European ancestry make up less than 25% of the global population. Yet, they represent the majority of the participants in the genetic research of companies mostly based in the United States. As a result, these companies have an extensive Eurocentric database but relatively small and limited Asian reference populations for estimating ancestry. However, as Figure 2 shows, these companies are continuously refining their estimations of ancestry by expanding their ancestry reference population databases and reaching out to reinforce global coverage to include Asian populations.

Three companies serve the people of Asia, which accounts for ~60% of the world's population.

The company, XCode Life, is based in India and claims to cover the 23% of the world's population of South Asia heritage. This area includes the 1.75 billion people of Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka.

WeGene DNA and Zuyan DNA, based in China serve the 1.6 billion people, 38% of the world's population, who live in East Asia. These are the countries of China, Mongolia, North Korea, South Korea, Japan, Hong Kong, Taiwan, Macau.

These two companies also the 9% of the world's population who live in Southeast Asia. These are the countries of Brunei, Cambodia, Southern China, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor Leste, Vietnam, Christmas Island, Cocos Islands. Both companies claim in-depth coverage of the populations they serve, and the WeGene company boasts an extensive Asian reference population database and as being "one of the rare companies that specialize in the genetic exploration of Asian heritage." WeGene's analytical algorithm uses machine learning and is based on the Admixture ancestor analysis tool developed by the University of California (UCLA), Los Angeles.

Figure 2. Measuring company databases.

4. Accuracy of estimating tester ancestry.

Each company relies on its reference population for estimating ancestry. Diahan Southard describes such estimates using the terms Precision and Recall as measures of company accuracy in estimating ancestry across the different populations the company supports.

The company, Ancestry, uses the term, Precision as how much of the reported ethnicity is true and the term, Recall as how much of the true ethnicity is called by the company's analysis. This is discussed in a detail in a white paper.

From a review of Southard's analysis, it can be argued the reference population databases of the companies I turned to, such as 23andMe, Ancestry, FTDNA, and MyHeritage, all have shortfalls in achieving reference population genomic diversity.

Anticipating different results from companies that rely on diverse reference populations for their ancestry estimates, I submitted test kits to several companies. Yet, for the most part, I found a relative consensus in their estimates of my ancestry.


5. Delineation of within continent populations

Ancestry and 23andMe serve mostly a European oriented population, WeGene focusses on an East Asian population and Xcode.Life focuses on South Asian ancestry. Moreover, companies differ in delineating the geographical, political boundaries, and the percentages of ancestry they assign to each. Note the variance among the companies in Table 2. Also, note how companies use their defined geographical labels for the areas they cover. It affects the usefulness of geographic, political names, and estimates of ancestry percentages.


In Table 2, the company 23andMe combines the ancestry population for Spain and Portugal, while Ancestry provides the ancestry population for each nation. Also, the use of the term Iberia differs between Family Tree and My Heritage. Family Tree includes Spain and Portugal as representing Iberia. My Heritage uses the term, Iberian Peninsula, to include Androrra, Spain, and Portugal.

6. company differences in naming ancestry.

Regardless of the company chosen, genetic genealogist, Kitty Cooper points out, "Current admixture composition algorithms have a long way to go. Sometimes East Asian ancestry is actually American Indian and South Asian might be Gypsy or Indian from India. Scandinavian might be British or North German and British and Irish might be Scandinavian."

Genetic genealogist Betty Bettinger adds, "the ethnicity estimate is good on the continental level; getting down to the country or region is much more problematic." Differences in geographical delineation need to be consider when evaluating company admixture estimates if they are to be meaningful. Note the labels in Table 3 to report an ancestry estimate from all four companies. The term Native American, used by 23andMe is the most inclusive. However, for localizing ancestry, it is less definitive both ancestrally and geographically as it includes North, Central, and South American ancestry. Most definitive are the labeling used by Ancestry, which results in a high Precision/Recall rate, as discussed in Section 3 above.



Three phrases from noted genetic genealogist, Roberta Estes, about ancestral ethnicity sum it up for me.

1. "Ethnicity percentages are only estimates."

2. "Ethnicity really is only reliable at a continental level."

3. "In determining majority ethnicity at the continent level, these tests are quite accurate.

For the purpose of my research, I concur with Estes. I accept my autosomal test estimates of my ancestral admixture at the continental level. As for ancestral estimations within continents, I agree with Bolnick and restrict such tests to tell me where there are people in the world with whom we share the same DNA.


