Case Study Application One

(All students need to complete this Case Study Application One)

(Total: 35 marks)

A researcher would like to identify factors associated with breastfeeding status at 6 months postpartum. Using a random sample (n=608) from a study, the researcher collected some relevant variables (including antenatal class attendance, delivery methods, pacifier use, infant formula use and age at when infant were given solid food) and saved in a dataset CSA DatasetForQ1 BF623 Sem2 2023.dta. The information of the variables in the dataset are given below:

Variable Name Description

BFStatus Breastfeeding status at 6th month (0 = NotBF, 1 = YesBF)

MumAge Mother’s age (1 = 25 yrs, 2 = 25 to 29 yrs, 3 =30+ yrs)

BWT Birth weight (1 = 2500 to 2999 grams, 2 = 3000 to 3499 grams, 3 =3500+ grams grams)

AntenatalClass Attended any classes about breastfeeding during pregnancy (1=Yes, 2=No)

DeliveryMethod Delivery Method (1= Normal delivery, 2 = C-section)

PacifierUse Pacifier use (0= No, 1 = Yes)

FormulaUse Infant formula used after delivery at hospital (0=No, 1 =Yes)

AgeSolids Infant age at when solids food was introduced (in weeks)

The main research question of this study is “whether infant age at when solids food was introduced is significantly associated with breastfeeding status at 6 months postpartum?” In addition, the researcher would like to understand the association between pacifier (or formula use) and breastfeeding status at 6 months postpartum.

Furthermore, the researcher would like to predict the probability of breastfeeding for infants with different personal characteristics. Use a 5% significance level for all statistical tests and conclusions. Use evidence (e.g., p values) from Stata outputs to support your answers.

Hint: You may find it helpful to follow the strategy for analyses given in computing lab Logistic Regression I &II.

1. (3 marks) Given this data, to answer the research questions, you need to help the researcher identify:

1.1. Which variable is the dependent variable (DV)?

1.2. Which variables are the independent variables (IV)?

1.3. Whether the researcher has a primary study variable of interest? If yes, which one?

1.4. Which kind of regression analysis that the researcher should use (you need to make a short justification for your suggestion)

Your Answer:

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

2. (4 marks) The researcher would like to assess whether any one of the following variables (i.e., (i.e., AntenatalClass, DeliveryMethod, PacifierUse, FormulaUse) may confound the association between AgeSolids and the breastfeeding status at 6 months postpartum. You need to help the researcher assess the possible confounding effect using the steps covered in our lectures/labs and fill the table below. [Hint: assess the confounding effect of each variable separately. You can choose to answer any two of the questions, e.g., a) and d)].

Attach relevant Stata outputs here.

Question: whether any one of the following variables is a confounder Your conclusion/comments and supporting evidence based on your outputs

a) Is AntenatalClass a confounder?

b) Is DeliveryMethod a confounder?

c) Is PacifierUse a confounder?

d) Is FormulaUse a confounder?

3. (3 marks) The researcher would like to know whether the association between AgeSolids and the breastfeeding status at 6 months postpartum is modified by any one of the following variables (i.e., AntenatalClass, DeliveryMethod, PacifierUse, FormulaUse) independently. Help the researcher answer the following questions with evidence from your analysis [Hint: assess the effect modification of each variable separately. You can choose to answer any two of the questions, e.g., f) and g)].

Attach relevant Stata outputs here.

Question: whether the effect of AgeSolids on breastfeeding status at 6 months is modified by any one of the given IVs? Your conclusion/comments and supporting evidence based on your output

e) Do AntenatalClass and AgeSolids interact each other?

f) Do DeliveryMethod and AgeSolids interact each other?

g) Do PacifierUse and AgeSolids interact each other?

h) Do FormulaUse and AgeSolids interact each other?

4. (10 marks) To answer the main research question: “whether infant age at when solids food was introduced is significantly associated with breastfeeding status at 6 months postpartum?”, the researcher asks your help to build up an appropriate parsimonious regression model.

4.1 Perform the multiple regression analysis that you recommended (in Question 1) without consideration of any interactions. List the factors/predictors that are significantly associated with breastfeeding status at 6 months in your final parsimonious regression model in the table below. (4 marks)

Attach Stata outputs (Hint: reporting a table with Odds Ratios) here to show your modelling procedure step by step on achieving the final parsimonious model.

Factor name in your final model Adjusted odds ratio 95% Confidence Interval p-value

Note you need to round your figures to 3 decimal places and clearly indicate reference group.

4.2 (6 Marks) Interpret the adjusted odds ratios and corresponding 95% CI related to each relevant factor you listed in the above table and answer the following research questions. Please note you need to use evidence from the Stata output to support your answers.

Research question 1): “whether infant age at when solids food was introduced is significantly associated with breastfeeding status at 6 months postpartum?”

Your Answer:

___________________________________________________________

___________________________________________________________

___________________________________________________________

___________________________________________________________

Research question 2): “If pacifier use is significantly associated with breastfeeding status at 6 months postpartum?”

Your Answer:

___________________________________________________________

___________________________________________________________

___________________________________________________________

___________________________________________________________

Research question 3): “If feeding infant formula after delivery at hospital is significantly associated with breastfeeding status at 6 months postpartum?”

Your Answer:

___________________________________________________________

___________________________________________________________

___________________________________________________________

___________________________________________________________

5. (7 marks) Using your final parsimonious regression model in Question 4.1, the researcher would like to predict the probability of being breastfed at 6 months postpartum for some infants based on their own specific information. He asks your help to build up an appropriate regression model for this purpose.

Attach Stata output (Hint: reporting a table with Coefficients) here.

5.1 (2 marks) Develop the multiple regression equation (coefficients are round up to 3 decimal places) based on your Stata output. P is the probability of breastfeeding at 6 months postpartum.

=______________________________________________________________

5.2 (4 marks) Now based on the above model, help the researcher calculate the predicted probability P of being breastfed at 6 months postpartum.

a) For an infant who was not given any infant formula after delivery at hospital but used a pacifier and given solid food in week 26. Show your steps of the prediction and make a brief comment on your prediction (2 Marks)

Your answer:

___________________________________________________________________

___________________________________________________________________

b) For an infant who was given an infant formula after delivery at hospital, but never used a pacifier and was fed by solid food late in week 30. Show your steps of the prediction and make a brief comment on your prediction. (2 Marks)

Your answer:

___________________________________________________________________

___________________________________________________________________

5.3 (1 Mark) Two friends of the researcher have 5 years gap in their age, i.e., one is a 25-year-old and the other 30-year-old. However their infants were fed by solid food at the same week of age, and both infants had exactly same status related to formula and pacifier use. Using the model in Question 5.1, the researcher concluded that the probabilities of breastfeeding at 6 months postpartum of his two friends are different due to the 5 years gap in their age and their baby’s conditions. Do you agree with the researcher?

Yes. I agree. Justify your agreement.

_______________________________________________________________________

_______________________________________________________________________

No. I disagree. Justify your disagreement.

_______________________________________________________________________

_______________________________________________________________________

6. (8 marks) The researcher further categorized the continuous variable AgeSolids into a 3-level categorical AgeSolidsCat variable (see table below). He asks your help to build a new model by only replacing the continue variable AgeSolids with the categorical variable AgeSolidsCat in your parsimonious regression model (see Question 4.1) but other variables are retained the same in this new model.

Attach Stata output (eg., parameter estimation table with Odds Ratios) here.

Variable Name Description

AgeSolidsCat Categorized based on AgeSolids: (1 = = 20 weeks, 2 = 21-25 weeks, 3 = =26 weeks)

6.1 (1 Mark) Do you think AgeSolidsCat is still a significant predictor of Breastfeeding status at 6th month? Justify your answer. Attach Stata output here.

Your answer:

__________________________________________________________________

__________________________________________________________________

6.2 (3 Marks) The researcher would like to know “based on the current sample, from which infant age (or later) being fed by solid foods, the odds of breastfeeding at 6 months will be significantly increased?” Help the researcher answer this question by interpreting the adjusted odds ratios (and 95% CI) related to AgeSolidsCat. You need to use evidence from the Stata output to support your answer.

Your answer:

__________________________________________________________________

__________________________________________________________________

6.3 (2 Marks) The researcher is confused which of the models, [namely, the one with continuous AgeSolids (see Question 4.1) and the model with the categorical AgeSolidsCat (this question Q6)], is better. Convince the researcher on which model you would use to explain the association between AgeSolids and the breastfeeding status at 6 months postpartum. Justify your choice using evidence.

Your answer:

__________________________________________________________________

__________________________________________________________________

6.4 (2 Marks) The researcher is happy with your help on his data analysis. He would like to take your further advice on his new prospective cohort study with 6 months follow-up. This study aims to identify the factors associated with the length of stay (LoS) at hospital among patients with a type of lung disease. Variables collected include main outcomes [LoS in days, and discharge status], demographic and socioeconomic characteristics [age (in years), gender, smoking status, BMI, education, marital status, job type, insurance status], clinical information [symptoms of the disease, severity of the disease, comorbidities and medical history]. He needs your advice on

i. which methods he should use to describe the distribution of the LoS at hospital,

ii. which methods he should use to compare the LoS at hospital between groups (for example, between genders or between smoking groups),

iii. which of the regression models covered in EPID6001 is an appropriate model that he should use to identify the factors associated with the length of stay (LoS).

You need to make a short justification to your answers.

Your answer:

__________________________________________________________________

__________________________________________________________________

Case Study Application Two

(Only students with a Random Allocation number “1” need to complete this Case Study Application Two using the following paper CSA PaperForQ2 Sem2 2023#1.pdf).

(Total: 20 marks)

This case study application uses information from a published paper “Ekholuenetale M, Wegbom AI, Tudeme G, Onikan A. Household factors associated with infant and under-five mortality in sub-Saharan Africa countries. Int J Child Care Educ Policy. (2020) 14:1–15. doi: 10.1186/s40723-020-00075-1” (It is attached with the assignment as CSA PaperForQ2 Sem2 2023#1.pdf in Blackboard).

1. (6 marks) Briefly describe the study to answer the following questions:

1.1. What were the study design and research aim?

1.2. How were the participants recruited: where, when, and how many?

1.3. List and comment briefly two main strengths and two limitations of the study

Your Answers:

___________________________________________________________

___________________________________________________________

2. (6 marks) Based on the paper information, answer the following questions:

2.1 What are the events of interest in this paper? How the authors calculated the survival time?

2.2 List possible reasons for censored observations.

Your Answers:

___________________________________________________________

___________________________________________________________

2.3 Only consider the -under-five mortality”, complete the table below for 10 children in sub-Saharan Africa countries with different conditions, where censoring status is coded “1” for event and “0” for censored. Use 30 days = 4 weeks = 1 month, 12 months =1 year, 60 months = 5 years in the calculation.

id Conditions Survival time (month) Censoring status

1 Died after 55.5 months of birth

2 Died after 44 weeks birth

3 Still alive at the last follow-up time

4 Alive after 300 days after birth but mother refuses the follow-up

5 Lost to follow up because family moved to other city but still alive on 1.5 years after birth

6 Died at 2.25 years after birth

7 Still alive at 54 weeks after birth but lost contacts

8 Drop-out, i.e. the follow-up being discontinued but alive at 69 weeks after birth

9 Died at 28 days after birth

10 Alive and celebrate five-year birthday

3. (8 marks) Based on the paper information, answer the following questions:

3.1 Which statistical regression analysis was used in this paper to achieve its research objective? Do you think the authors used a correct model? And why?

Your Answers

___________________________________________________________

___________________________________________________________

3.2 Did the authors assess the assumption associated with the regression model used? List at least two methods you learnt from our unit.

Your Answers

___________________________________________________________

___________________________________________________________

3.3 Do you think “Household wealth quintiles” in Model II (see Table 3 and Table 4) is a significant factor? If you were the authors, which Stata syntax/command you can use to obtain an overall p value for this factor?

Your Answers

___________________________________________________________

___________________________________________________________

3.4 Only refer to Table 3 of this paper, choose one of the factors in Model II to interpret its effect (along with the corresponding 95% CI) on infant mortality using your own words.

Your Answers

___________________________________________________________

___________________________________________________________

3.5 Based on Table 3 and Table 4, do you think Model II and Model IV are parsimonious models? You need to make a justification using evidence from the two tables. List a regression method you learnt from our unit for building up a parsimonious model.

Your Answers

___________________________________________________________

___________________________________________________________

Case Study Application Two

(Only students with a Random Allocation number “2” need to complete this Case Study Application Two using the following paper CSA PaperForQ2 Sem2 2023#2.pdf).

(Total: 20 marks)

This case study application uses information from a published paper “David T Doku, Subas Neupane, Survival analysis of the association between antenatal care attendance and neonatal mortality in 57 low- and middle-income countries, International Journal of Epidemiology, Volume 46, Issue 5, October 2017, Pages 1668–1677, https://doi.org/10.1093/ije/dyx125” (It is attached with the assignment as CSA PaperForQ2 Sem2 2023#2.pdf in Blackboard).

1. (6 marks) Briefly describe the study to answer the following questions:

1.1 What were the study design and research aim?

1.2 How were the participants recruited: where, when, and how many?

1.3 List and comment briefly two main strengths and two limitations of the study

Your Answers:

___________________________________________________________

___________________________________________________________

2. (6 marks) Based on the paper information, answer the following questions:

2.1 What are the events of interest in this paper? How the authors calculated the survival time?

2.2 List possible reasons for censored observations.

Your Answers:

___________________________________________________________

___________________________________________________________

2.3 Complete the table below for 10 neonates in one of the 57 low- and middle-income countries with different conditions, where censoring status is coded “1” for event and “0” for censored. Use 30 days = 4 weeks = 1 month, 12 months =1 year, 60 months = 5 years in the calculation.

id Conditions Survival time (days) Censoring status

1 Died 2.5 days after birth

2 Died 6 hours after birth

3 Still alive at the last follow-up time

4 Alive after 120 hours after birth but mother refuses the follow-up

5 Lost to follow up because family moved to other city but still alive on 7th day after birth

6 Died at 12 hours after birth

7 Still alive at the last follow-up time

8 Drop-out, i.e. the follow-up being discontinued but alive on 4th day after birth

9 Died at 18 hours after birth

10 Died 108 hour after birth

3. (8 marks) Based on the paper information, answer the following questions:

3.1 Which statistical regression analysis was used in this paper to achieve its research objective? Do you think the authors used a correct model? And why?

Your Answers

___________________________________________________________

___________________________________________________________

3.2 Did the authors assess the assumption associated with the regression model used? List at least two methods you learnt from our unit.

Your Answers

___________________________________________________________

___________________________________________________________

3.3 Do you think “Number of ANC visits” (see Table 2) is a significant factor? If you were the authors, which Stata syntax/command you can use to obtain an overall p value for this factor?

Your Answers

___________________________________________________________

___________________________________________________________

3.4 Only refer to Table 2 of this paper, choose one of the factors to interpret its effect (along with the corresponding 95% CI) on neonatal mortality using your own words.

Your Answers

___________________________________________________________

___________________________________________________________

3.5 Based on Figure 3, the authors concluded that “The Europe and Central Asia region experienced better survival, whereas the South Asia region had the worst survival”. Do you agree with their conclusion? Do you think Africa experienced a better neonatal survival compared to other three regions (East Asia & Pacific, Latin America & Caribbean, and Middle East & North Africa)? You need to make a justification using evidence illustrated in Figure 3.

Your Answers