CONSERVACIÓN GENERAL
3. PRIMERA PAUTA: EL CONTEXTO Y LAS CIRCUNSTANCIAS
3.3. Marco histórico
Missing Data: Every effort was made by the SAFETALK team to obtain complete data on all study participants. For each variable of interest in the proposed study, missing values were identified and distributions and patterns of these missing values were evaluated carefully. Participants who refused to answer questions regarding disclosure or sexual behavior or were otherwise missing information on whether or not they disclosed were classified as missing. I chose not to employ any imputation methods for stand-alone items or items as part of scales that were missing. Missing values were less than 5% for every
variable of interest with the exception of one control variable (relationship status had 5.6% missing values).
Data Editing and Cleaning: Study staff noticed several baseline ACASIs in which data appeared to have been entered incorrectly, possibly due to participants’ clicking the wrong button on the computer or clicking a button too many times. These would include
participants who reported an unusually large number of sexual partners (e.g., 333) in the past three months. We received IRB approval to speak with these participants to double-check answers that appear to be erroneous. When there were key-in errors identified, ACASI responses were corrected by re-asking the participant the relevant questions that were associated with the erroneous initial responses. These updated responses were added to my revised dataset.
Data Screening: I conducted my data analyses using Stata 10.0. To screen the data, I examined the dataset for reasonableness using univariate descriptive statistics to produce frequencies, distributions, measures of central tendency, and measures of dispersion for all variables of interest. The existence of outliers was examined. When such cases were found, separate analyses were performed including and excluding those outliers. I also analyzed scales to assess their psychometric properties (DeVellis, 2003). I assessed internal
consistency estimates of reliability for scales (reporting Cronbach’s alpha) and item-by-tem analysis was conducted to consider any items that should be dropped from the scales. Overall scale scores were calculated by averaging item responses. I chose to not impute any missing responses. Scale items had, on average, fewer than 5% missing values.
Assessing for Multicollinearity: Multicollinearity occurs when independent variables are highly correlated with one other, making it difficult to assess the independent importance of an individual predictor variable as each accounts for similar variance in the dependent variable. When two variables are highly correlated, they are basically measuring the same phenomenon. When multicollinearity between variables is present, p-values can be
misleading and the regression coefficients’ confidence intervals will be very wide. This can lead to incorrect conclusions about the relationships between independent and dependent variables of interest. Since my research questions seek to estimate the contributions of individual correlates of serostatus disclosure (see Aim 1 below), I assessed for
multicollinearity by producing a Pearson correlation matrix in Stata to examine the bivariate correlations between my independent variables. The analysis failed to detect a high level of
association among any of the predictor variables. Therefore, no variables were deleted from the model as a result of high multicollinearity.
Stata output also displays collinearity diagnostics, including tolerance and Variance Inflation Factor (VIF) values for each predictor variable. Tolerance indicates the percent of variance in the predictor that cannot be accounted for by the other predictors. As such, very small values indicate that a predictor is redundant, and values that are less than .10 merit further investigation. If the VIF calculated as 1 divided by tolerance, is greater than 10 there are cause for concerns about multicollinearity. Based on these criteria, no variables
demonstrated high correlation. Thus, all were included in subsequent analyses.
4.4.2 Analysis for Aim 1
To describe and examine the correlates of serostatus disclosure to sexual partners among HIV-positive patients
The dependent variable for Aim 1 was serostatus disclosure to sexual partners. My conceptual model includes eight correlates listed below. Note that the first two variables listed below relate to characteristics of the partner of the participant and the remaining six variables pertain to the participant case directly.
1) Partner relationship type (main versus casual)
2) Partner serostatus (HIV-positive, HIV-negative, unknown serostatus) 3) Disclosure stigma
4) Beliefs in the seriousness of transmission risk in the presence of HAART 5) Beliefs in transmission risk based on viral load
6) Subjective norms regarding serostatus disclosure 7) Urbanicity
To address Aim 1, I examined bivariate relationships between serostatus disclosure to sex partners as the dependent variable and each of the independent variables of interest. Those independent variables that were significantly associated with serostatus disclosure at an alpha of 0.20, the standard threshold for achieving a parsimonious model, were retained for the subsequent multivariate analysis. I then entered all independent variables into a logistic regression model simultaneously based on my conceptual model.
I used multinomial logistic regression to determine the independent associations between serostatus disclosure and the proposed independent variables and to determine the best combination of variables that predict serostatus disclosure. I used a p-value of .05 as the criterion for statistical significance of factors in the final regression models. In order to be able to compare each of the categories of disclosure (to none, some, or all partners) I ran two separate multinomial logistic regression models. In the first model, disclosure to all partners was entered as the base category for a comparison with disclosure to no partners. The second model was performed with disclosure to all partners serving as the base category for a
comparison with disclosure to some partners. Adjusted odds ratios and 95% confidence intervals are presented in the final model for the variables.
Multinomial logistic regression is a commonly used technique for determining the probability of a dichotomous outcome variable, given a set of independent variables that may be continuous, discrete, dichotomous, or a mixture of types. Logistic regression allows for flexibility compared to other techniques as the correlates do not have to be linearly related, normally distributed, or of equal variance in each group (Tabachnick & Fidell, 2001).
Those participants who reported greater than 10 partners were excluded from my analysis as the validity of reports of more than 10 partners are low (Kissinger et al., 2003;
O'Brien et al., 2003). Since results for those cases with multiple partners might be correlated, I also ran a sensitivity analysis including and excluding those with greater than 10 partners. Since the sensitivity analysis yielded the same conclusions as my original analysis, I only present results from the original analysis only reporting those with 10 or fewer partners.
For the first two independent variables (partner relationship type and partner serostatus), I used multinomial logistic regression to examine HIV serostatus disclosure to sexual partners to determine partners to whom participants disclosed. More specifically, I examined disclosure across two groups: a) participants who report one sex partner and thus only one opportunity to disclose, and b) participants who report more than one partner with more than one opportunity to disclose. I then combined these two groups to examine overall disclosure across all study participants who report one or greater sexual partners. I also ran separate analyses for MSW, MSM, and WSM. These analyses demonstrate how partner relationship type and partner serostatus affect serostatus disclosure. I conducted a correlated data analysis since results for participants with multiple partners are likely to be correlated.
In order to assess the association between relationship type, partner serostatus type, serostatus disclosure, and disclosure stigma, several steps were taken to recode the data. First, I trichotomized the continuous relationship type variable that represents relationship type proportionally into three groups for participants who have: a) all primary partners, b) all casual partners, and c) mixed relationship type, which represents a combination of both primary and casual partners. I also recoded partner serostatus type from a continuous variable that represents serostatus proportionally to a categorical variable with the following four groups among participants who report sexual partner(s) who are: a) all HIV-positive
partners, b) all HIV-negative partners, c) all unknown serostatus partners, and 4) mixed serostatus partners.
4.4.3 Analysis for Aim 2
To assess the role of moderating variables in the relationship between serostatus disclosure and sexual transmission risk behaviors
I first used a contingency table analysis first to determine the rate of transmission risk behaviors. In each of the four cells in table 10 below, I will examine how many times
vaginal and anal sex occurred in the past three months with HIV-negative partners and unknown serostatus partners. I also examined the number of times these sexual acts were protected with condoms. This allowed me to determine if the rate of unprotected sexual activity is the same across partner relationship type and partner serostatus. I conducted a similar analysis in each of the four cells in table 11 below for primary partners and casual partners.
Table 10: Contingency Table for WSM to Examine Rate of TRB by Partner Serostatus
TRB Partner serostatus
HIV-negative HIV status unknown
Yes No
Table 11: Contingency Table for WSM to Examine Rate of TRB by Relationship Type
TRB Relationship Type
Primary Casual
Yes No
I examined transmission risk behavior using logistic regression with the two categories listed below.
a) Sexually active and 100% protected sexual activity
b) Sexually active and less than 100% protected sexual activity
I used logistic regression to examine the moderator effects in my conceptual model since my independent (serostatus disclosure) and moderator (partner relationship type and partner serostatus) variables were categorical. When the strength of the relationship between two variables is dependent on a third variable, moderation is occurring. The third variable, or moderator (W), interacts with X in predicting Y if the regression weight of Y on X varies as a function of W. The relationship between the independent variable (serostatus disclosure) and outcome variable (transmission risk behavior) will be explored for each of the levels of each moderator variable. The moderating variable of relationship type was categorized into two levels: a) all primary partners and b) all casual partners. The moderating variable of partner serostatus was categorized into two levels: a) all HIV-negative partners and b) all unknown partners. When there is evidence of a qualitative (different direction of the effect) or quantitative (different strength of association) difference, I assessed the effect of the moderation (Frazier, Tix, & Barron, 2004). The analysis involved the following three steps:
1. I recoded variables so that categorical variables had dummy codes.
2. I created an interaction term that was the product of the predictor and moderator variables.
3. I ran stepwise logistic regression by entering the predictor variable, the moderator variables, and finally the interaction terms.
4.4 Power calculation
An accepted guideline for estimating the sample size needed to have significant power to detect differences in multivariate logistic regression is to have approximately 20 cases for each independent variable in the regression model (Kleinbaum, Kupper et al. 1982). In my dissertation, I have eight independent variables of interest.
In logistic regression, effect sizes are stated in terms of the probability at the mean of the predictor variable and the probability at the mean plus one standard deviation. Using PASS, a statistical and power analysis software program, I set alpha at 0.05 and my sample size at 490 (the pre-set participant enrollment number in the SAFETALK study). P0 is the response probability at the mean of X and P1 is the response probability when X is increased to one standard deviation above the mean. With six independent variables, 490 cases provide adequate power for logistic regression (Hsieh, Bloch, & Larsen, 1998). Table 11 below illustrates the calculations with 490 as my sample size with power set at .8 and power set at 0.9.
Table 12: Power Calculation
Power N P0 P1 OR Squared Alpha Beta
0.9 490 0.7 0.774 1.465 0.3 0.05 0.1
CHAPTER FIVE: RESULTS
In this chapter, I present the results of my dissertation research. First, I present sample characteristics, including frequencies of demographic characteristics and variables of interest. Next, I present the results of the analysis of each research question stated in Chapter Three.