PIK3R3 is a candidate regulator of platelet count in people of Bangladeshi ancestry

Background Blood platelets are mediators of atherothrombotic disease and are regulated by complex sets of genes. Association studies in European ancestry populations have already detected informative platelet regulatory loci. Studies in other ancestries can potentially reveal new associations because of different allele frequencies, linkage structures, and variant effects. Objectives To reveal new regulatory genes for platelet count (PLT). Methods Genome-wide association studies (GWAS) were performed in 20,218 Bangladeshi and 9198 Pakistani individuals from the Genes & Health study. Loci significantly associated with PLT underwent fine-mapping to identify candidate genes. Results Of 1588 significantly associated variants (P < 5 × 10−8) at 20 loci in the Bangladeshi analysis, most replicated findings in prior transancestry GWAS and in the Pakistani analysis. However, the Bangladeshi locus defined by rs946528 (chr1:46019890) did not associate with PLT in the Pakistani analysis but was in the same linkage disequilibrium block (r2 ≥ 0.5) as PLT-associated variants in prior East Asian GWAS. The single independent association signal was refined to a 95% credible set of 343 variants spanning 8 coding genes. Functional annotation, mapping to megakaryocyte regulatory regions, and colocalization with blood expression quantitative trait loci identified the likely mediator of the PLT phenotype to be PIK3R3 encoding a regulator of phosphoinositol 3-kinase (PI3K). Conclusion Abnormal PI3K activity in the vessel wall is already implicated in the pathogenesis of atherothrombosis. Our identification of a new association between PIK3R3 and PLT provides further mechanistic insights into the contribution of the PI3K pathway to platelet biology.


| I N T R O D U C T I O N
Circulating platelet count (PLT) is an independent predictor of morbidity and mortality from multiple cardiovascular and inflammatory disorders, including atherothrombosis [1][2][3], but is influenced by complex sets of interacting genes (h 2 > 0.3-0.8) [4,5] that may be different between ancestries [1,6]. Understanding the genetic basis of PLT offers important insights into the pathophysiology of plateletmediated cardiovascular disease and disparities in health outcomes between populations [7,8].
Large genetic association studies for PLT and other blood cell traits have historically been restricted to European (EUR) populations [9]. However, utilization of non-EUR populations can reveal novel genetic associations because of differences in allele frequency, linkage disequilibrium (LD) structure, and variant effects driven by environmental selection pressures and genetic drift [10]. Transancestry and ancestry-specific genome-wide association studies (GWAS) have already exploited this to reveal multiple new loci for PLT [1,6]. Here, we extend this approach by performing a GWAS in a UK collection of individuals from Bangladesh alongside a comparator population of individuals from Pakistan.

| M E T H O D S
Analyses were performed on the July 2021 data release of the Genes & Health study [11] in accordance with ethical approval from the Phenotype data were derived from linked electronic health records, which included blood cell counts measured using a Sysmex XE-2100 analyzer (Sysmex, Kobe, Japan). PLT for each individual was defined as the mean of all PLTs recorded in the electronic health records. PLT were adjusted for sex, age, height, and weight, and rankbased inverse normal transformation was applied. Associations between PLT and variants imputed using the TOPMed-r2Minimac4 1.5.7 Imputation Server were calculated using BOLT-LMM v2.3.6 using the first 10 genetic principal components as covariates. Index variants were defined as those with the lowest P value within a genome-wide significant (P < 5 × 10 −8 ) locus. Conditional analyses were performed at each significantly associated locus using the index variant as a covariate to detect the presence of secondary association signals. The contributions of associated variants from each locus to PLT were evaluated further by comparing allele frequencies and effect sizes between the Bangladeshi and Pakistani populations. We tested for colocalization of variants in genomic regions associated with PLT in previous transancestry GWAS and in relevant subpopulations [1]

| R E S U L T S A N D D I S C U S S I O N
The characteristics of the 20,218 Bangladeshi and 9198 Pakistani individuals in the analysis populations are summarized in Supplementary Table S1. The mean PLT was lower in the Bangladeshi (mean, 266.4 × 10 9 /L) than in the Pakistani populations (271.5 × 10 9 /L; P = 1.16 × 10 −10 ; Supplementary Fig. S1). Although small case series have shown that the frequency of thrombocytopenia (PLT <150 × 10 9 /L) is higher in residents of the Eastern Indian subcontinent compared to other regions [12], we were unable to detect an increase in thrombocytopenia in British Bangladeshis (Supplementary   Table S1). Single nucleotide polymorphism (SNP)-based heritability for PLT was 26.9% in the Bangladeshis and 25.3% in Pakistanis.  Table S2). Conditional analysis of the variants at each locus revealed that 2 loci had secondary signals of association (P < 5 × 10 −8 ). These were rs3846855 and rs653178, which mapped to GGNBP1 and ATXN2, respectively ( Supplementary Fig. S2A), both of which were associated with PLT in the transancestry or ancestryspecific GWAS of Chen et al. [1], and rs653178 was in LD with the index variant for that region.

Essentials
• Understanding which genes control platelet count provides insights into atherothrombosis.
• To identify new platelet genes, we analyzed data from Bangladeshi and Pakistani populations.
• Most platelet genes identified in our analysis overlapped with those in other populations.
• A new region containing the PIK3R3 gene was linked to platelet count in Bangladeshis.
In the smaller Pakistani population, there were 68 PLT-associated  (Table, Supplementary Table S6). For 11 of the Bangladeshi index variants, there was a high posterior probability (PP) that the association signal with PLT in the transethnic GWAS was driven by the same variant (PP of H 4 >0.8). A further 5 variants had a similarly high likelihood of colocalization with PLT-associated variants in either the SAS or East Asian (EAS) ancestry-specific GWAS populations [1] ( Table and Supplementary Table S6).
Of the remaining 4 Bangladeshi index variants that did not colocalize with previous PLT-associated variants, rs149810016 and rs59596869 mapped by VEP to GFI1B and to MAST4, respectively, which are PLT-associated genes identified from other index variants in the meta-GWAS by Chen et al. [1] rs1877194 was annotated by VEP to an intergenic region but is adjacent to the PLT-associated LY75-C302. These findings suggest that these association signals poten-  (Table, Supplementary Table S6) Table S4) [13,14]. All of these variants were >200 kB from rs946528 and since none colocalized with variants in the rs946528 region, this likely represents a novel PLT-associated signal.
Variants in LD (r 2 > 0.5) with rs946528 span 8 UCSC Genome T A B L E Platelet count-associated index variants identified in Bangladeshi individuals (n = 20,218) and colocalization with prior transancestry and ancestry-specific genome-wide association studies [1].

Bangladeshi index variants
Posterior probability of a common shared association signal (H 4 ) Chromosomal position rsID Bangladeshi index variants were defined as those with the lowest P value within each associated locus. Chromosomal positions are expressed relative to the GRCh38 genome assembly with the coded/ alternate alleles on the positive strand. Genes were assigned to each index variant by annotating with VEP and selecting the gene with the most severe functional consequence. Effect size (beta), SE, and probability of association are shown for each Bangladeshi index variant. Data are also presented for a colocalization analysis generated using the coloc( ) R package in which the platelet count associations for all variants within 500 kB of the Bangladeshi index variants were compared in the transancestry genome-wide association study population. If there was a low posterior probability of colocalization (<80%), then colocalization was tested against the SAS and EAS genome-wide association study analysis population [1].
overlapped with a prominent area of epigenetic activity, suggesting that this variant lies within a regulatory region (Figure 2). At least part of this epigenetic activity may be accounted for by the immediate adjacency of rs1707303 to a consensus binding site for RUNX1, a critical transcription factor necessary for differentiation and maturation of platelet-forming megakaryocytes [16]. The rs946528 association signal colocalized with cis-eQTLs in whole blood for MAST2 and IPP with posterior probabilities of 45.2% and 31.4%, respectively, below the threshold of 80% usually considered indicative of a shared causal variant ( Supplementary Fig. S5, Supplementary Table S5).
Neither MAST2 nor IPP are expressed in platelets and neither has a plausible biological role in platelet production [15].
In this first reported GWAS for a hematological trait in a Bangladeshi population, several lines of evidence suggested that the the Bangladeshi rs946528 association window also included variants that replicated findings from other GWAS for PLT in previous EAS populations but which have not been previously annotated [1,13,14].
By contrast, the other PLT-associated loci in the Bangladeshi and Pakistani GWAS replicated prior transancestry GWAS, and in most cases, they were linked to genes already implicated in platelet biology [1]. Replication of the rs946528 association interval as an apparently ancestral EAS PLT-associated locus was unsurprising given that modern Bangladeshi populations are predominantly of SAS ancestry but with significant EAS and South-East Asian admixture [17]. This F I G U R E 2 Detailed view of the Bangladeshi rs946528 association interval. LocusZoom plot of associations in the 562 kB association interval for the chromosome 1 index variant rs946528, defined as variants within r 2 ≥ 0.5 calculated using linkage disequilibrium (LD) reference data from the Bengali from Bangladesh 1000 Genomes dataset. Variants with P > .05 are excluded to enable visualization of the recombination peaks. Indicated variants are the index rs946528 and the 3 variants associated with platelet count in prior East Asian ancestry-specific genome-wide association studies. Beneath and aligned with this are the 8 UCSC Genome Browser-annotated genes within the interval; positions of the 95% credible set of 343 variants and annotated regions of epigenetic activity from CD34 − , CD41 + , and CD42 + megakaryocytes, the progenitor cell for circulating platelets, which were provided by the BLUEPRINT Epigenomics Project [25]. Megakaryocyte chromatin immunoprecipitation sequencing (ChIP-seq) data are also shown indicating relevant transcription factor binding sites [16]. The expanded view shows the relationship between the significantly associated variant rs1707303 and a putative regulatory region surrounding the first exon of the candidate gene PIK3R3.
confirmatory discovery in the Bangladeshi population highlights the value of ancestry-specific GWAS in providing additional insights into the architecture of complex loci associated with population traits that complement transancestry or large EUR population GWAS [1,9].
One significant challenge with the rs946528 locus is that the single independent association signal for PLT was attributable to 95% credible set of 343 variants in a haplotype block containing 8 annotated genes. Considering orthogonal evidence from several sources, PIK3R3 was identified as the most likely candidate mediator of the PLT phenotype. This was primarily because the PIK3R3 intron 1 variant rs1707303 was unique within the rs946528 haplotype in that it mapped to an area with multiple epigenetic signals indicating a PIK3R3 regulatory region most likely related to a consensus binding site for RUNX1, a critical megakaryocyte transcription factor [16].
PIK3R3 is further supported as a candidate mediator of the PLT phenotype because it was the only gene at this locus to be expressed within platelets and because it encodes phosphatidylinositol 3-kinase regulatory subunit gamma (PIK3R3; Uniprot Q92569). In an interactome analysis using the STRING database [18], PIK3R3 has 10 first order interactions with high confidence (>0.7 score), all with proteins that are also represented in the platelet proteome ( Figure 3)

ETHICS STATEMENT
The Genes and Health study was approved by the London South-East

RELATIONSHIP DISCLOSURE
There are no competing interests to disclose.