As Alex blogged earlier, the Personal Genome Project (PGP) is hoping to work with the National Institutes of Science and Technology (NIST) to use PGP materials (cell lines and DNA) for NIST's "Genome in a Bottle" reference material. One of the things NIST is looking for, and that we'd love to see more of, is diversity.
Seeking diversity
Because the PGP is self-recruiting, we don't have a very balanced set of participants. "Self-recruitment" means that all participants have enrolled in our project through word of mouth, finding our website and enrolling online. To put it bluntly, that means we mostly end up with young white men. Here are some graphs from our recent paper:
Researchers would love to see more diversity in PGP data. However, the self-recruitment model is ideally suited for the PGP: self-recruited participants are more likely to have a good understanding of the goals and risks involved. And so we’ll simply put out the word here for people already following us: underrepresented groups are especially appreciated. Research within one or two racial/ethnic categories isn’t necessarily a virtue, biracial and multiracial heritage may be even more interesting to some researchers and can open more areas for future research.
Why NIST wants PGP material
What does it mean for NIST to be considering the PGP for genome reference material? Major advances have been occurring in DNA sequencing and personal genomics; it's a competitive and rapidly evolving market. Manufacturers of instruments need standard human genome material to use for calibration of their machines, and others would like to use it to compare different devices and create common quality metrics. For example, the Food and Drug Administration may use this material for certification of sequencing instruments. It is possible this reference material will become ubiquitous -- spread far and wide, in a variety of commercial devices, with little ability to protect or regulate the uses of it.
[caption id="attachment_894" align="aligncenter" width="600"] To visualize the potential widespread usage of reference material, I've made this informal sketch of the sequencing process. Not to scale, lasers may not be included in all models.[/caption]
Why use PGP samples? Even though NIST's genome reference material will be manufactured using cell lines, those cell lines originally come from a person -- society is realizing that tissues and DNA are very personal things! In the wake of the experiences of Henrietta Lacks and HeLa cells (as documented in Rebecca Skloot's recent book), NIST wants to make sure the material they use comes from people who understood and agreed to potentially widespread usage. The PGP's "open consent" is a gold standard for careful consent to broad usage: PGP participants acknowledge and agree to things other subjects have not, including the risk of re-identification and commercial uses of their material.1
Parents and children
In particular, NIST is looking for "trios": two parents and a child. Researchers like to use samples from trios because they know every piece of DNA in the child comes from one of the parents. This makes it easier to assess error rates -- and that sort of quality control is what NIST expects the genome material to be used for. We think all such family groups are valuable, but current trios in the PGP haven't been the most diverse...
Trio | Self-reported race/ethnicity |
---|---|
hu5D9DE3/huFE1569/huA8BCB0 | White |
hu91BD69/hu38168C/huCA017E | Asian |
hu16360E/hu28DA07/hu1A7894 | White |
huAA53E0/hu8E87A9/hu6E4515 | White |
huB4E01A/hu39790F/hu781C4E | White |
huCDC3B8/huFE01E1/hu1E8957/hu961968 | White |
huAA8CF9/hu7DB29E/hu2ED134 | White |
hu620F18/huD4BF17/huD62596 | White, American Indian / Alaskan Native2 |
hu1053CC/huFAF1FE/hu40D515 | Unknown |
huC434ED/huD44B2B/hu25DE85 | White |
hu36CDF1/hu210C97/huCFD87D | White |
NIST's team has told us they would like to have samples representing the breadth of human genetic diversity -- various ethnicities and multiracial heritages. Our project would love to enable that, but we are sensitive to the history of minorities in human subjects research. Participation in the PGP has many acknowledged risks and no promised benefits -- it definitely isn't for everyone. I can't even promise that NIST will use your samples (many would see that as a benefit rather than a risk). I'm simply going to write that NIST and other researchers wish they could have more diversity, and about the lack of it in the PGP -- maybe, if we're lucky, it will inspire some new participants to self-recruit.
Footnotes
1: More specifically, the PGP promises not to seek financial gain or commercial profit from materials (although cost recovery is allowed), but may "permit your cell lines to be used for research, patient care, commercial or other purposes". We don't expect anyone's genome to be uniquely valuable, but blocking any and all commercial uses of shared material is often viewed as overly restrictive. If a company wants to include NIST's DNA standard in a commercial machine, that would be a commercial usage of the material.
2: I have observed a high rate of people reporting both White and Native American ancestry in PGP participants (it's the second most common category, see the table below). While not questioning any specific individual, some genealogists have cast doubts on the high frequency of these cases. Elizabeth Warren's experience may be a common one.
Self-reported race/ethnicity | # of PGP participants |
White | 1285 |
American Indian / Alaska Native, White |
40 |
Asian | 38 |
Hispanic or Latino, White | 24 |
Hispanic or Latino | 21 |
Black or African American | 12 |
Black or African American | 12 |
Asian, White | 11 |
Black or African American, White | 7 |
American Indian / Alaska Native, Black or African American, White | 4 |
Hispanic or Latino, Black or African American, White | 3 |
American Indian / Alaska Native, Hispanic or Latino, White | 3 |