+1 (315) 557-6473 

Understanding Logistic regression

Logistic regression analysis is a statistical method used to find an equation that predicts the outcome of a binary variable. The only difference between logistic regression and linear regression is that logistic regression uses the log odds ratio rather than the probabilities used in linear regression. This gives the researcher more freedom when using logistic regression.

Descriptive statistics

The descriptive statistics of diagnostic interval by year shows that average diagnostic interval is highest in 2016 (M=36.87, SD=42.41, N=1,548) followed by 2010 (M=34.56 , SD=41.47, N=1,353,), then by 2016 (M=33.85,SD=41.93,N=1,630), then by 2013 (M=33.39,SD=38.17,N=1,422), then by 2012 (M=32.51,SD=36.73,N=1,330) then by 2014 (M=32.35,SD=39.03,N=1,356) and the least is 2011 ((M=31.86,SD=39.19,N=1,270). For region, region 4 (M=42.35,SD=48.00,N=622) has the highest average diagnostic interval followed by region 1 (M=34.78, SD=30.73, N=4,445) and then by region 3 (M=33.19, SD=38.47, N=1,760) and the least is region 2 (M=30.73, SD=37.88, N=3,082).

Data research analysis

The regression model to be used is a multiple linear regression model. The dependent variable is diagnostic interval while the independent variables are year of diagnosis and health region. Other control variables are age group, community size, cancer stage, and neighborhood income. All the independent variables and control will be coded into dummy variables. Then we will regress the diagnostic interval on the independent variables. Since all independent variables are dummy variables, we will choose the first level to be the base to avoid perfect multicollinearity, then we will estimate the adjusted average diagnostic length for other levels by adding the constant to the respective coefficient and then test the hypothesis that the adjusted average is equal to 49 days, if we have p<0.05, we conclude that guideline is not adhered to. The final hypothesized model is given as
diag_int=β_(1-7) Year dummies+β_(8-11) Region dummies+β_(12-13) csize dummies+β_(14-18) stage dummies+β_(19-20) Income dummies+β_(21-24) age_group dummies+β_(25-26) det dummies

Data research output model

Study Objective: The objective of the study is to investigate if the guidelines of a 7-week target for diagnostic intervals adhere to every year and every region in Alberta.

Method: The data consists of simulated data on all primary first-ever breast cancer in women in Alberta, the data set consists of 9,909 observations and 10 variables which are, id, diagnostic interval, region, year, detection method, age, age group, cancer stage, community size, and neighborhood income. The method of analysis is a multiple linear regression model and STATA 14 software will be used.

Result: the regression result is presented in table 2, our interest is in the last two columns which provide the adjusted average for each of the levels apart from the base and the p-value for the null hypothesis that they are equal to 49. For all the base dummies, their adjusted average diagnostic interval is the constant which is not significantly different from 49 (p=0.3465). for years, all p-value is greater than 0.05 except 2015 (M=53.99, p=0.0469). For the region of the health authority, all p-values are greater than 0.05 except in region 4 (M=58.63, p=0.067). for the control variables, all p-values for income and age group are greater than 0.05 which means they are not different from 49 days. However, for urban community size, the adjusted mean is significantly greater than 49 (p=53.87). for cancer stage and screen detection, an average diagnostic interval is significantly less than 49 days.

Conclusion: given the result above, we conclude that the guideline is adhered to all the years except 2015 and in all health regions except region 4. Income and age group does not affect whether guidelines are met or not while community size, detection method, and cancer stage affect whether guidelines will be met.
Appendix
Table 1: Summary statistics of the diagnostic interval by independent variables

Variable

Levels

Obs

Mean

Std.Dev.

Min

Max

Year

2010

1,353

34.5558

41.47153

0

295

2011

1,270

31.86535

39.19425

0

280

2012

1,330

32.51579

36.73751

0

241

2013

1,422

33.39803

38.17135

0

268

2014

1,356

32.35103

39.02998

0

310

2015

1,548

36.8708

42.4108

0

285

2016

1,630

33.85215

41.93524

0

281

Region

Region 1

4,445

34.78313

40.64375

0

267

Region 2

3,082

30.72875

37.88113

0

310

Region 3

1,760

33.19375

38.46828

0

285

Region 4

622

42.35691

48.00052

0

268

csize

Rural

2,071

33.27764

38.8659

0

268

Urban

7,838

33.83082

40.33199

0

310

stage

0

1,334

44.43853

46.42146

0

285

1

3,990

32.50752

39.29039

0

295

2

3,014

31.05209

37.88881

0

310

3

1,210

33.27603

39.17303

0

263

4

361

31.14404

36.56252

0

225

Incomeq

High

4,095

33.26935

39.37119

0

295

Low

5,772

34.01421

40.50235

0

310

Age group

39-

603

37.94859

44.98094

0

310

40-49

1,721

32.49448

38.4354

0

295

50-69

5,451

33.23115

39.31082

0

285

70+

2,134

34.73993

41.52299

0

285

Detection method

No

5,860

36.64693

42.89485

0

310

Yes

4,049

29.47222

35.04632

0

267


Table 2: regression result

Source

SS

df

MS

Number of obs

=

9,867

 

 

 

 

 

 

F(19, 9847)

=

16.92

 

 

Model

499903.2

19

26310.69

Prob > F

=

0

 

 

Residual

15314479

9,847

1555.243

R-squared

=

0.0316

 

 

 

 

 

 

Adj R-squared

=

0.0297

 

 

Total

15814382

9,866

1602.917

Root MSE

=

39.437

 

 

diag_int

Coef.

Std. Err.

t

P>t

[95% Conf.

Interval]

adjusted estimates

p>49

year

 

 

 

 

 

 

 

 

2011

-2.78388

1.544251

-1.8

0.071

-5.81092

0.243174

48.60249

0.8764

2012

-1.86133

1.52701

-1.22

0.223

-4.85458

1.131927

49.52503

0.8371

2013

-1.05957

1.501508

-0.71

0.48

-4.00284

1.883691

50.32679

0.6017

2014

-1.49114

1.520678

-0.98

0.327

-4.47198

1.489703

49.89522

0.7248

2015

2.609912

1.471206

1.77

0.076

-0.27395

5.493776

53.99627

0.0469

2016

-0.46243

1.456265

-0.32

0.751

-3.31701

2.392143

50.92393

0.439

stage

 

 

 

 

 

 

 

 

1

-12.6648

1.253695

-10.1

0

-15.1223

-10.2073

38.72153

<0.001

2

-16.7358

1.336492

-12.52

0

-19.3556

-14.116

34.65059

<0.001

3

-15.5823

1.616628

-9.64

0

-18.7512

-12.4134

35.80404

<0.001

4

-17.8391

2.384647

-7.48

0

-22.5135

-13.1647

33.5473

<0.001

rhan

 

 

 

 

 

 

 

 

Region 2

-4.46662

0.940487

-4.75

0

-6.31017

-2.62308

46.91974

0.4271

Region 3

-0.48025

1.362242

-0.35

0.724

-3.15052

2.190025

50.90611

0.427

Region 4

7.243908

1.704822

4.25

0

3.902108

10.58571

58.63027

0.0007

incomeqn

 

 

 

 

 

 

 

Low

0.433228

0.806866

0.54

0.591

-1.1484

2.014851

51.81959

0.2616

age_groupn

 

 

 

 

 

 

 

40-49

-2.75159

1.896045

-1.45

0.147

-6.46823

0.965043

48.63477

0.8687

50-69

-1.48077

1.742776

-0.85

0.396

-4.89697

1.935429

49.90559

0.6624

70+

-0.92921

1.847533

-0.5

0.615

-4.55076

2.692331

50.45715

0.4982

detn

 

 

 

 

 

 

 

 

Yes

-9.80668

0.883767

-11.1

0

-11.539

-8.07432

41.57968

0.0045

csize2

 

 

 

 

 

 

 

 

Urban

2.484267

1.270398

1.96

0.051

-0.00597

4.974507

53.87063

0.0034

_cons

51.38636

2.534833

20.27

0

46.41757

56.35515

 

0.3465