A couple months ago I posted about our new trait surveys. Thank you to all the participants who completed these so far! I'm following up now with links to the data, a bit of Python code for interpreting them, and a little analysis.
The website has been updated to make the csv format files containing results from google surveys publicly available. Here are links to the PGP participant survey, and the twelve trait surveys:
To help anyone interested in parsing the data, I've shared the Python code I've used on Github. There's also a copy of the survey data as of Feb 23 there along with some demo code, and a Readme.
Finally, I did a bit of parsing of the trait survey data, combined with some features of the participant survey data (age and sex) to see if I could find anything interesting for you. The top 20 pairwise correlations below don't look terribly surprising, but I learned some new things. For example, I didn't know that TMJ disorder is much more common in women. (Of course, a quick web search discovers confirms this is a well known association.) This type of analysis isn't my forte -- maybe someone with more experience with machine learning can do cool stuff with this data!
(To read the table below: the first row indicates "60% of females reported a UTI, while only 12% of others reported one".)
Trait 1 | Trait 2 | p-value1 | % 1 with 2 | % others with 2 |
---|---|---|---|---|
Female | Urinary tract infection (UTI) | 3.1e-23 | 60.3% | 12.0% |
High cholesterol (hypercholesterolemia) | High triglycerides (hypertriglyceridemia) |
2.6e-15 | 36.9% | 3.1% |
Female | Ovarian cysts | 1.9e-13 | 21.9% | 0.4%2 |
60+ years | Age-related cataract | 9.1e-13 | 36.8% | 2.8% |
Female | Iron deficiency anemia | 2.4e-11 | 28.5% | 4.0% |
Myopia (Nearsightedness) | Astigmatism | 5.7e-11 | 59.2% | 25.5% |
High cholesterol (hypercholesterolemia) | Hypertension | 1.6e-09 | 42.9% | 11.6% |
Iron deficiency anemia | Urinary tract infection (UTI) | 1.9e-09 | 69.2% | 25.3% |
60+ years | Age-related hearing loss | 6.5e-09 | 31.6% | 4.1% |
Urinary tract infection (UTI) | Ovarian cysts | 1.7e-08 | 22.0% | 3.1% |
Polycystic ovary syndrome (PCOS) | Ovarian cysts | 2.1e-08 | 66.7% | 6.6% |
Hypothyroidism | Hashimoto's thyroiditis | 2.7e-08 | 23.1% | 0.6% |
Temporomandibular joint (TMJ) disorder | Fibrocystic breast disease | 5.3e-08 | 28.6% | 2.4% |
Nasal polyps | Chronic sinusitis | 7.0e-08 | 61.1% | 7.8% |
Osteoarthritis | Bone spurs | 1.1e-07 | 29.8% | 3.6% |
Female | Temporomandibular joint (TMJ) disorder | 1.4e-07 | 21.9% | 4.0% |
Female | Fibrocystic breast disease | 1.9e-07 | 12.6% | 0.4%2 |
Male | Hair loss (includes female and male pattern baldness) |
3.0e-07 | 29.5% | 8.3% |
Carpal tunnel syndrome | Temporomandibular joint (TMJ) disorder | 4.4e-07 | 52.2% | 8.5% |
Urinary tract infection (UTI) | Fibrocystic breast disease | 5.4e-07 | 14.4% | 1.2% |
1As calculated using a Fisher's Exact test. Note that these are not corrected for multiple hypothesis testing. I think a pessimistic Bonferroni correction would demand around 1e-6 for the magic 'p = 0.05' cutoff.
2I didn't look closely, but I suspect these non-zero numbers are because we have some transgender participants whose sex at birth differs from the gender they identify with (and the latter was what we have recorded on the participant survey).