Effect of Imbalanced Sampling and Missing Data on Associations between Gender Norms and Risk of Adolescent HIV

Stanford University School of Medicine (Gupta, Abdalla, Mejía-Guevara, Darmstadt); University of Minnesota School of Medicine (Gupta); University of Texas - Dallas (Vicas); University of Nevada (Weber); London School of Hygiene and Tropical Medicine (Cislaghi); Health Care Service Corporation, or HCSC (Meausoone)
"...findings emphasise that if the data is not from all of us, it may not be for any of us..."
Despite efforts globally to advance equality by addressing restrictive gender norms, gender inequalities persist and contribute to worsened health outcomes. Global health datasets that are ill-equipped for research at the gender-health nexus may be a factor impeding progress, because interventions derived from datasets built on gender bias and restrictive gender norms can perpetuate a cycle of gender inequalities and inadequate data collection. This study provides a framework to measure the effects of gender- and age-imbalanced and missing data on gender-health research. The framework is demonstrated using a previously studied pathway for effects of pre-marital sex norms among adults on adolescent HIV risk.
Previously, this group of researchers proposed strategies to operationalise gender norms measurement using existing data. In particular, prior analyses using Zambian Demographic and Health Survey (DHS) data demonstrated that as community-level discordance or incongruity between adults' professed attitudes and apparent behaviours regarding pre-marital sex widened, the risk for HIV acquisition increased for adolescent girls in that community. The researchers characterise this survey (2007 Zambia DHS) as balanced in that it asked normative questions on pre-marital sex of men about men and women and of women about men and women, providing quantitative insights on the effect this gender norm. The present paper extends this analysis, asking: Do balanced, complete data truly matter when measuring gender norms' effects?
After identifying gender-age-imbalanced DHS datasets, the researchers resampled responses and restricted covariate data from the relatively complete, balanced dataset derived from the 2007 Zambian DHS to replicate imbalanced gender-age sampling and covariate missingness (i.e., covariates with no available data). (Example of covariate missingness: In a review of covariates missing from gender-age-imbalanced datasets that were required for female HIV risk models, seven surveys did not measure experiences of intimate partner violence, one survey did not measure the age difference of sexual partners, and three surveys did not include either.) Differences in model outcomes due to sampling were measured using tests for interaction. Missing covariate effects were measured by comparing fully-adjusted and reduced model fitness.
The researchers simulated data from 25 DHS surveys across 20 countries from 2005-2014 on four sex-stratified models for pathways of adult attitude-behaviour discordance regarding pre-marital sex and adolescent risk of HIV. On average, across gender-age-imbalanced surveys, males comprised 29.6% of responses, compared to 45.3% in the gender-balanced dataset. Gender-age-imbalanced sampling significantly affected regression coefficients in 40% of model-scenarios (N = 40 of 100), and it biased relative-risk estimates away from gender-age-balanced sampling outcomes in 46% (N = 46) of model-scenarios.
Broadly, imbalanced sampling on the basis of gender and age variably affected model results in an unpredictable manner; the most significant effects were seen on male adolescent HIV risk models, possibly due to under-sampling, whereas female adolescent HIV risk models were more robust to sampling variability. The fit of all models was generally robust to covariate missingness.
Overall, the findings indicate that, although the statistical effects of imbalanced sampling may go unnoticed when drawing conclusions, in some cases, imbalanced sampling can affect the reliability and interpretation of findings related to how gendered norms influence health outcomes.
Among the suggested directions for future research proposed here: study of the effects of under-sampling subgroups (e.g., adolescent males and older females), which could highlight how the reliability of cross-gender data (data from men about women and vice-versa) plays a role in pathways concerned with gender norms.
In conclusion, the researchers assert that their "study and framework provide novel and important nuance to the global conversation on how to better collect gender-health data and may inform future survey sampling methodologies to promote gender-health research reliability....Data with built-in gender-age imbalance poses risks for deriving inaccurate conclusions, misinforming program and policy design, and recapitulating inequalities....[B]alanced sampling across and inclusion of all genders and ages can improve the reliability of gender-health research, including work on cross-gender normative influences."
eClinicalMedicine 2022;50: 101513. https://doi.org/10.1016/j.eclinm.2022.101513. Image credit: UN Women/Urjasi Rudra via Flickr (CC BY-NC-ND 2.0)
- Log in to post comments











































