*Simulations to Assess Power for Treatment Effect on a Vector of Binary Events MULTBINPOW 1.0 software assists researches in choosing the most powerful statistical test in the design of randomized and non-randomized studies in which the outcome is a composite binary-event endpoint MULTBINPOW calculates empirical power via simulations for 8 distinct statistical tests, each comparing 2 vectors of binary events (treatment versus control) under user-specified ranges of within-subject correlations, covariance structures, sample sizes and incidences. Details of the statistical methods are given in Mascha and Imrey (2010). Simulations are based on the method of Oman (2009). Power is computed for the following statistical tests: 1. Collapsed composite of any-versus-none of the events 2. Count of events within individual (Mann-Whitney and T-test) 3. Minimum P-value test 4. GEE common effect test (also called “global odds ratio” test). 5. GEE average relative effect (distinct effects test, removes influence of high incident components) 6. GEE K-df distinct effects test (analogous to Hotelling’s T2, not sensitive to direction) 7. GEE covariance-weighted distinct effects test (usually very similar power to GEE common effect) 8. GEE treatment-component interaction distinct effects test (whether treatment effect varies across vector) The user specifies a range of underlying within-subject correlations, covariance matrix, sample sizes, and response vectors for treatment and control groups; *All parameter option for MULTBINPOW are described in detail below; * AUTHOR: Edward J. Mascha, PhD, Cleveland Clinic. Address questions/comments to maschae@ccf.org; *Citing this program if using it to design a study is much appreciated; *This program is not guaranteed, although has been carefully tested; *Updates will be made periodically to enhance available options and outputs; *A SAMPLE design and CALL are as follows: Design: Calculate power (at alpha=0.05)to compare 2 vectors of 3 proportions each, Treatment: (pr_t=.18 .16 .09), Control: (pr_c= .20 .20 .15), Weight outcomes 1-3 as .2 .4 .4, respectively Assess power at within-subject correlations of .10, .30, .50 and N/group of 200, 400, 600. Number of simulations: B=1000 for each combination of within-subject correlation and N/group. Create simulations using exchangeable correlation (rmat=2) and use unstructured (use_rmat=3) to analyze data; * %LET myPATH=your directory to store results; * %inc "/YOUR DIRECTORY/multbinpow.sas"; * %multbinpow(path=&mypath, reset=0, sims=1, summary=1, makesum=1, outdata=results,imlwt=0, startsim=1, numsims=1000, pr_t= .18 .16 .09, pr_c= .20 .20 .15, rmin=10, rmax=50, rby=20, nmin=200, nmax=600,nby=200, rmat=2, use_rmat=3, wt=.2 .4 .4, spar_ck=0, varypc=0, varyor=0, where=,SIMSEED=1234, alpha=.05, listres=0); ******************************************************************************************************************** *PARAMETERS DEFINITIONS/OPTIONS path= User-defined path to which results will be sent outdata=multpower User-specified name of output SAS dataset containing power results sims=1 1= create simulations (RUNs loops.sas which calls simulation macro (must=1 to create simulations)), 0= No runanal=1 1= analyze simulated data (usually=1 when SIMS=1. can set to 0 with SIMS=1 to create datasets w/o analysis) summary=1 Print results (assumes makesum=1 was run at least once after simulations finished) makesum=1 Create summary SAS dataset &libr..&outdata._sum based on specified simulations (requires summary=1) startsim=1 Index number to begin current simulation runs (usually 1, but can be any number > 0) numsims=100 Number of simulated samples to create (eg, if startsim=1 and numsims=100, 100 runs will be made. To add 400 to initial 100 simulations, for example, use startsim=101 and numsims=500) pr_t= Vector of outcome proportions for components 1 through K in Treatment group (e.g., pr_t = .10 .15 .20) pr_c= Vector of outcome proportions for components 1 through K in Control group (e.g., pr_c = .15 .18 .25) Below: set parameters for within-subject correlation and sample size combinations. Total simulation strata are number of distinct correlations times number of distinct sample sizes. The specific R values below (from Rmin to Rmax by Rby) indicates the starting correlation parameter within a particular set of simulations. *SEE README FILE FOR MORE DETAILS and EXAMPLES. Rmin=10 Starting (i.e., lowest) value of within-subject correlation (x 100) to use in simulations. The specific R value indicates the correlation parameter used within a particular set of simulations. Rmax=40 Maximum (i.e., highest) value of within-subject correlation (x 100) to use in simulations. The specific R value indicates the correlation parameter used within a particular set of simulations. Rby=10 By value for within-subject correlations (x 100). Example: rmin=10, rmax=40, rby=10 creates 4 sets of simulations, corresponding to within-subject correlations equal to 0.1, 0.2, 0.3 and 0.4 Nmin=100 Lowest PER GROUP sample size to be used in sumulations Nmax=500 Highest PER GROUP sample size to be used in sumulations Nby=100 by value for PER GROUP sample sizes. Example, nmin=50, nmax=150, nby=50 creates simulations for N=50, 100 and 150 and thus Total sample sizes for N=100, 200 and 300 ** additional parameters listres=0 0= Do not show every SAS proc result for each simulation, 1= yes rmat=1 Simulation working corr 1= AR(1) first order autoregressive, 2= CS (compound symmetry) Example: AR(1) useful to specify high correlation for some components, lower for others use_rmat= rmat Analysis working correlation. Default=rmat. Otherwise, 1= AR(1), 2= CS (compound symmetry), 3=UN (unspecified) wt=1 Input either single weight of 1 to be used for all components, or vector (eg wt= .1 .2 .2 .3 .2) summing to 1 multtest_b=2000 Number of resamples for testing individual components using proc multtest. If individual tests not interesting, set this number low (e.g., 100) to speed up program reset=0 0=do not delete data in (outdata=), 1= delete bootstrap results for given dataset (outdata=) spar_ck=1 Check sparcity of simulated data in treatment-component crossings imlwt=0 RUN PROC IML to obtain test for covariance-matrix weighted average of estimated treatment effects -- if this is chosen, must have PROC IML module live. otherwise, PROC IML is not needed SIMSEED=1234 Simulation seed. Actual seed used for sims is function of bseed, ith boot, nper and rho alpha=.05 Alpha at which power calculated figpath=&path Physical path where figure should be put figname=powerfig Filename for JPEG figure figdata=figdata SAS filename for power results as presented in figure figtype=jpg Valid SAS figure types. Also used as extension to figure. FigHeight =800px Figure height FigWidth = 600px Figure width figDpi=150 Figure DPI where= Specify subset of data to be used in proc tabulate summary of results (eg, where= where rho= .20) libr=perm Library "perm" is the user-specified path (see path parameter). LIBR must be left as PERM for macros to work progname=mySAS Insert the name of your SAS program to be used as a label in output title clearlog=1 Clear log after each simulation run (0= do not clear logs) ********************************************************************************************************************************/ *PROGRAM BEGINS HERE ************************************************************************* PROGRAM BEGINS HERE; %macro multbinpow(path=, outdata=results, sims=1, runanal=1,numsims=100, startsim=1,summary=1, reset=1, pr_t=, pr_c=, rmin=10, rmax=40, rby=10, nmin=100, nmax=500,nby=100,listres=0, rmat=1, use_rmat=, wt=1, makesum=1, multtest_b=2000, spar_ck=1, imlwt=0, where=,SIMSEED=1234, alpha=.05, plot=1, figDpi=150,figdata=fig&outdata, figpath=&path, figname=powerfig,FigType =jpg, FigHeight =800px, FigWidth = 600px, progname=mySAS, libr=perm, clearlog=1); ** 1) multbinpow.sas calls loops.sas (simulation loops) 2) loops.sas calls simmultbin_Or.sas (create and analyze simulated data) 3) loops.sas and simmultbin_Or.sas are appended below; %global sparse_ct; %let sparse_ct=0; libname perm "&path"; proc format; value test 1='Average ' 2='Common ' 3='Collapsed ' 4='Count ' 5='K-DF ' 6='Min P-val ' 7='Interaction ' 8='COV-WT Avg' ; ******** RUN SIMULATIONS; proc datasets nolist; delete n max plotdata; run; %if &reset=1 %then %do; proc datasets lib=&libr nolist; delete &outdata &outdata._sum; %end; %if &sims=1 %then %do; %do bigrho= &rmin*1 %to &rmax*1 %by &rby*1; %do n_gp =&nmin*1 %to &nmax*1 %by &nby*1; %loops(analy=&runanal,lib=&libr,reset=1,rname=%quote("r&bigrho.n&n_gp."),outds=&outdata,nsims=&numsims,n=&n_gp, r=&bigrho, pt=&pr_t, pc=&pr_c, list=&listres, rstructure=&rmat, use_r=&use_rmat, sim1=&startsim, weight=&wt, sp_ck=&spar_ck, lseed=&simseed); %end; %end; %end; *END SIMS; ************ END RUN SIMULATIONS; ************ SUMMARIZE RESULTS ;****************************************************;;; %if &summary =1 %then %do; %macro countk(x); %* count the number of COMPONENTS; %let kout=0; %do %while(%scan(&x,&kout+1) ^= ); %let kout=%eval(&kout+1); %end; %*put *&kout*; %mend; %global kout; %countk(&pr_t) *CREATE POWER variables from simulations results; %macro makesum; %if &MAKESUM=1 %then %do; data &libr..&outdata._sum;; set &libr..&outdata; **resall; sparse_ct= &sparse_ct; alpha=&alpha*1; *alpha at which power calculated; array p p_b_coll pz_b_gc p_avg_wald p_distinct_kdf p_avg_sc p_chi_INTER p_w_count p_diff_ct min_pval min_bootp pwald_wtavg; array reject rp_b_coll rpz_b_gc rp_avg_wald rp_distinct_kdf rp_avg_sc rp_chi_INTER rp_w_count rp_diff_ct rmin_pval rmin_bootp rpwald_wtavg; do over p; if p ne . then do; if p LE alpha then reject=1; else reject=0; end; end; %macro labels; %do p=2 %to &kout; label wcorr&p="GEE working corr(1,&p)"; %end; %do p=1 %to &kout; label outcome&p="logit incidence &p"; %end; %do p=1 %to &kout; label pc&p="Pc&p" pt&p="Pt&p" ; %end; %mend; %labels label nper="N/group" wcorr2="GEE working corr(1,2)" k="Number of components" b_coll= "Beta collapsed composite" se_b_coll="SE(B) collapsed composite" b_gc= "Beta common effect" se_b_gc= "SE(B) common effect" b_avg_wald= "Beta average relative effect" se_avg_wald= "SE(B) avg rel effect" ; label rp_b_coll="Pow collapsed comp (Wald)" rpz_b_gc="Pow common effect (Wald)" rp_avg_wald="Pow avg rel effect (Wald)" rp_distinct_kdf="Pow K-df distinct(Wald)" rp_avg_sc="Pow avg rel effect(Score)" rp_chi_INTER="Pow TX-component interaction (Score)" rp_w_count="Pow Count (MW)" rp_diff_ct="Pow Count (pooled t)" rmin_pval="Pow minimum P-value ???" rmin_bootp="Pow min bootstrap P-val" rpwald_wtavg="Pow var-cov wt avg" /* same as previous */ ; run; *CALCULATE NUMBER OF REALIZED SIMULATIONS WITHIN NPER AND RHO LEVELS (should=numsims); proc sort data=&libr..&outdata._sum; by nper rho; proc means data=&libr..&outdata._sum n noprint; by nper rho; var SIMi; output out=n n=nsims; run; data &libr..&outdata._sum; merge &libr..&outdata._sum n; by nper rho; run; %end; *END MAKESUM; run; %mend makesum; %makesum; *TABULATE SIMULATION RESULTS; *CALCULATE MAX NUMBER of REALIZED SIMULATIONS ACROSS NPER AND RHO LEVELS; proc means data=&libr..&outdata._sum max noprint; var nsims; output out=max max=maxsims; run; data _null_; set max; call symput ("maxsims", maxsims); run; %LET MAXSIMS=%EVAL(&MAXSIMS+0); %macro dolist; %do p=1 %to &kout; pc&p %end; %do p=1 %to &kout; pt&p %end; %do p=2 %to &kout; wcorr&p %end; %mend; %macro IMLwt; %if &imlwt=1 %then %do; /*rpz_beta_wtavg*/ rpwald_wtavg %end; %mend; proc sort data= &libr..&outdata._sum; by name; options ls=120; run; %let vlist= k %dolist b_coll se_b_coll b_gc se_b_gc b_avg_wald se_avg_wald rp_b_coll rp_w_count rp_diff_ct rpz_b_gc rp_avg_wald rp_avg_sc rp_distinct_kdf rmin_bootp rp_chi_INTER %imlwt; %let rmat_txt=; %let use_rmat_txt=; %if &rmat=1 %then %do; %let rmat_txt=AR(1); %end; %*put rmat_txt= &rmat_txt; %if &rmat=2 %then %do; %let rmat_txt=exchangeable; %end; %*put rmat_txt= &rmat_txt; %if &use_rmat=1 %then %do; %let use_rmat_txt = AR(1); %end; %*put use_rmat_txt= &Use_rmat_txt; %if &use_rmat=2 %then %do; %let use_rmat_txt = exchangeable; %end; %*put use_rmat_txt= &use_rmat_txt; %if &use_rmat =3 %then %do; %let use_rmat_txt=unstructured; %end; %*put use_rmat_txt= &use_rmat_txt; run; title1 "Power calculations for multivariate binary outcomes based on &maxsims simulations "; title2 "PT= &pr_t, PC= &pr_c Component weights= &wt SASprogram =&progname"; title3 "Covariance: simulation=&rmat_txt, analysis=&use_rmat_txt"; proc contents data= &libr..&outdata._sum short position; run; proc tabulate data= &libr..&outdata._sum fc='2020202020202020202020'x noseps; &where; class name rho nper; var &vlist nsims k; table &vlist, name*(n*f=5.0 mean*f=5.2)/ rts=32; table (&vlist), (nper*rho)*(mean*f=5.2)/ rts=32; run; ******** PLOT RESULTS ************** **************** *************** ******************; %if &plot=1 %then %do; %let v= rp_avg_wald rpz_b_gc rp_b_coll rp_w_count rp_distinct_kdf rmin_bootp rp_chi_INTER %imlwt;; proc sort data= &libr..&outdata._sum;; by nper rho; proc means data=&libr..&outdata._sum n mean noprint ; by nper rho; var &v; output out=means mean=&v n=nrecs; run; data uni; set means; array vars &v; do i=1 to dim(vars); test=i; meanvar=vars(i); nrec=nrec; output uni; end; run; proc sort data=uni; by nper; * value test 1='Average ' 2='Common ' 3='Collapsed ' 4='Count ' 5='K-DF ' 6='Min P-val ' 7='Interaction ' 8='COV-WT Avg' ; data &libr..&figdata plotdata; set uni; %if &imlwt=1 %then %do; where test in(1,2,3,4,5,6,7,8); %end; %else %do; where test in(1,2,3,4,5,6,7); %end; meanvar=meanvar*100; format test test.; r=rho*100; run; %macro outfig; proc sort data= plotdata; by nper; run; proc sql noprint; select count(distinct nper) into :nfig from plotdata; quit; ods path work.grafttemp(update) sashelp.tmplmst(read); proc template ; define style styles.mystyle; parent = styles.listing; style GraphFonts from GraphFonts / 'GraphDataFont' = ("SWISSB", 11pt) 'GraphUnicodeFont' = ("SWISSB", 11pt) 'GraphValueFont' = ("SWISSB", 11pt) 'GraphLabelFont' = ("SWISSB", 13pt, bold) 'GraphFootnoteFont' = ("SWISSB", 11pt) 'GraphTitleFont' = ("SWISSB", 11pt) ; style GraphBackground / color = white; style GraphAxisLines from GraphAxisLines / linethickness = 1px; style GraphWalls from GraphWalls / LineThickness = 1px FrameBorder = on; end; run; ods listing style = mystyle image_dpi= &figDpi gpath= "&FigPath"; ods graphics on/ reset border= off imagefmt = &Figtype. imagename = "&FigName" HEIGHT = &FigHeight width = &FigWidth scale = on; %let nrow = %sysevalf(&nfig./2,ceil); %let xlow = %sysevalf(&rmin. - 10); %let xhigh = %sysevalf(&rmax. + 10); proc sgpanel data=plotdata; %if &nfig. > 4 %then %do; panelby nper/LAYOUT= PANEL columns = 2 rows = &nrow. ; %end; %else %do; panelby nper/columns = 1 SPACING= 0; %end; series y = meanvar x = r /group = test name = 'mygplot' LINEATTRS= (pattern = 1 thickness = 2); ROWAXIS label = "Rejections" VALUES= (0 to 100 by 20); COLAXIS label = "Rho" VALUES= (&xlow. to &xhigh. by 10); KEYLEGEND 'mygplot' /NOBORDER POSITION=bottom title = 'Test'; title1 J=L h=12pt bold "Power for multivariate binary outcomes"; title2 J=C h=10pt "Response: PT= &pr_t, PC= &pr_c"; title3 J=C h=10pt "Weights= &wt, simulations =&maxsims"; title4 J=C h=10pt "Covariance: Simulations=&rmat_txt, Analysis=&use_rmat_txt"; run; title; footnote; ods graphics off; ods path reset; quit; %mend outfig; %outfig; goptions reset=all cback=white; ** hsize = 4in vsize = 4in; title1 h=2 f=swissb "Power for multivariate binary outcomes"; title1 h=1.7 f=swissb "Response: PT=&pr_t, PC=&pr_c"; title2 h=1.7 f=swissb "Weights= &wt, Sims =&maxsims"; title3 h=1.7 f=swissb "Covariance: Simulations=&rmat_txt, Analysis=&use_rmat_txt"; %let psize=1.9; %let lsize=1.4; *.25in; axis1 order=(0 to 100 by 20) value=(h=&psize font="swissb") label=(h=&psize font="swissb"); ** LENGTH=4in; axis2 order=(%eval(&rmin-10) to %eval(&rmax+10) by 10) value=(h=&psize font="swissb")label=(h=&psize font="swissb") LENGTH=6in; legend1 label=(h=&lsize font="swissb") value=(h=&lsize font="swissb"); proc sort data=plotdata; by nper; proc gplot data=plotdata; by nper; label meanvar="Rejections" r="Rho" test="Test" nper="N per group"; plot meanvar * r =test/ vaxis=axis1 haxis=axis2 hminor=1 vminor=1 legend=legend1; %do i=1 %to 8; symbol&i i=join repeat=1 w=2.5 l=&i; %end; run; quit; %end; * END PLOT; %end; *END SUMMARY; %mend multbinpow; *********************************************************************************************************; *********************************************************************************************************; %macro loops(lib=,reset=1,outds=, rname=,nsims=3, n=, r=, k=, p=, ortx=, or_outcome=,or_tx_outcome=, analy=1, pt=,pc= , list=0, rstructure=&rmat, use_r=, sim1=1, weight=1, sp_ck=1, iml=&imlwt,lseed=&bseed); %global max_rho; %do SIMI=&sim1 %to &nsims; %simmultbin_Or(name=&rname, anal=&analy, runiml=&iml, NPER=&n, rho=&r, pijt=&pt, pijc=&pc, rholist=0, listing=&list, rtype=&rstructure, wcorr=&use_r, wts=&weight, sparse_ck= &sp_ck, seed=&lseed); %if &clearlog=1 %then %do; dm 'out;clear; log; clear'; %end; %end; %mend; *********************************************************************************************************; *********************************************************************************************************; %MACRO SIMMULTBIN_Or(NPER=50,rho=,rtype=1,n_outcomes=,anal=1, rholist=1, pijt=, pijc=, name=, listing=0,seed=&lseed, wcorr=, wts=1, sparse_ck=1, runiml=1); ** rtype 1= ar(1) 2= cs; *rmatrix for simulation model; ** wcorr =1 ar(1) 2=cs 3=un; *rmatrix for analysis; *e.g., wcorr=3: unstructured in analyis, regardless of how simulated; %if &rho=0 %then %let rho=1; *IF ZERO SPECIFIED, USE R=.01; %let rho100= ρ %if &rtype=1 %then %let rmatrix=ar(1); %if &rtype=2 %then %let rmatrix=cs; %if &wcorr=' ' %then %let rfit= &rmatrix; %else %if &wcorr=1 %then %let rfit=ar(1); %else %if &wcorr=2 %then %let rfit=cs; %else %if &wcorr=3 %then %let rfit=un; proc datasets nolist; delete res gcbeta gctype3 INTTYPE3 ESTIMATE gdtype3 t_pvals coll ecorr temp temp2 min_max min_max2 beta tx1 tx2 ints outpi contrasts pij betac tstats ttests sparse freqs boot_p bootps bootp beta2 rcov_b rcov1 iml_wt_wald MEAN MEAN2 PC PT PCPT; run; %macro countvar(x); %* count the number of outcoms specified; %let k=0; %let ktemp= %scan(&x,&k+1,'.'); %put ktemp= "&ktemp"; %do %while(%scan(&x,&k+1,'.') ^= ); %let k=%eval(&k+1); %end; %put *&k*; %mend; %countvar(&pijt); %let n_outcomes=&k; %let k_inv= %sysevalf(&k.**(-1)); %put kinv= &k_inv; *COUNT NUMBER OF WEIGHTS SPECIFIED; %macro countvar(x); %* count the number of weights specified; %let w=0; %do %while(%scan(&x,&w+1) ^= ); %let w=%eval(&w+1); %end; %put *&w*; %mend; %global w; %countvar(&wts); data temp; rho=&rho/100; call symput('rho', rho); k= &k; *** &n_outcomes*1; SIMI=&SIMI; *specify the pij for treatment and outcomes; array pt{*} pt1 -pt&k (&pijt); array pc{*} pc1- pc&k (&pijc); array theta{*} theta1-theta&k; array eps{*} eps1-eps&k; array z{*} z1-z&k; array u{*} u1-u&k; array y{*} y1-y&k; DO I=1 TO &K; THETA{I}=.; EPS{I} =.; Z{I}=.; U{I} = .; Y{I} =.; END; *USE GIVNE SEED, but make sure seed not too big; seed= &SEED; *SETTING SEED; * this works fine; seed2= int( (&seed + &SIMI + &nper + &rho) * (&nper/&rho)*&SIMI); if seed2 > 2147483647 then seed2 = int(sqrt(seed2)); if seed2 > 2147483647 then seed2 = int(1/seed2); ***int(sqrt(seed2)); call symput("seed2", seed2); seed_init= seed2; ** based on Oman (2009). "Easily simulated multivariate binary distributions with given positive and negative correlations." Computational Statistics & Data Analysis 2009, 53(4): 999-1005; do treat=0 to 1 by 1; do id0=1 to &nper by 1; do i= 1 to k; if treat=1 then do; id= id0+ &nper; end; else if treat=0 then do; id=id0; **pi_ij= pijc{i}; end; ** CALL RANBIN(seed,n,p,x); *SIMULATE AUTOREGRESSIVE (1) DATA; if &rtype= 1 then do; *Ar(1); if treat=1 then theta{i} = probit(pt{i}); else if treat=0 then theta{i} = probit(pc{i}); call rannor(seed2,eps{i}); * *rannorm(seed); IF I=1 THEN z{i} = eps{i}; *ONLY RETAIN THE FIRST -- SET FIRST AND KEEP; call ranbin(seed2,1,abs(rho),u{i}); if i>1 then do; z{i} = sign(&rho) * u{i} * z{i-1} + (1-u{i}) * eps{i}; end; y{i} = (z{i} LE theta{i}); end; *SIMULATE EXCHANGEABLE DATA; if &rtype= 2 then do; *CS; if treat=1 then theta{i} = probit(pt{i}); else if treat=0 then theta{i} = probit(pc{i}); IF I=1 THEN call rannor(seed2, eps0); * DO ONLY AT START AND KEEP; call rannor(seed2,eps{i}); call ranbin(seed2,1, sqrt(rho), u{i}); z{i} = u{i}*eps0 + (1-u{i}) *eps{i}; y{i} = (z{i} LE theta{i}); end; end; *i= 1 to k; output temp; end; end; run; **ods listing; **title "TEST: Pijc= &pijc Pijt= &pijt PC_=&PC_ pc_u= &PC_U pc=&pc"; **proc print data=temp; run; *CREATE WEIGHT variable WT FOR EACH OBSERVATION based on specified weight vector or constant value for all (WTS=); data temp2; set temp; w=&w*1; *number elements in WTS macro variable; array ys{*} y1-y&k; array weights{*} w1-w&k (&wts); *add outcome-specific weights, in any; do i=1 to &k; outcome=i; y_ij=ys{i}; *If single value given, use that for all outcomes; if w=1 then wt=1; *If only single weight, makes no difference what it is, except for estimation of AVG effect -- for that needs to be 1/k if equal weights (see model 3 below); else wt= weights{i}; output temp2; end; run; *title 'temp2'; *proc print data=temp2; * VAR ID treat outcome y_ij W wt; run; %if &SIMI=1 %then %do; *get max,min PI and use for Frechet bounds; data pij; keep pij treat o; array pt{*} pt1 -pt&k (&pijt); array pc{*} pc1- pc&k (&pijc); do o=1 to &k; treat=1; pij= pt{o}; output pij; treat=0; pij= pc{o}; output pij; end; run; *proc print; run; proc sort data=pij; by treat; proc means noprint data=pij noprint; by treat; var pij; output out=min_max(drop=_type_ _freq_) min=min_pi max=max_pi; run; data min_max; set min_max; ** max_rho = min_pi*( 1 - max_pi )/max_pi/( 1 - min_pi ); * call symput('max_rho', max_rho); * if treat=1 then call symput('max_rho_treat', max_rho); * if treat=0 then call symput('max_rho_control', max_rho); pmin=max(0, max_pi + min_pi -1); *Banadur, 1961 - logical bounds for P(Yi x Yj =1); pmax= min(max_pi, min_pi); *Banadur, 1961 - logical bounds for P(Yi x Yj =1); *Frechet bounds; *f(u,v) = [u(1-v)/{v(1-u)}] 1/2; *here using widest range of pis within cluster, within treatment group; pi=min_pi; pj= max_pi; rmin = -1* min( sqrt(pi*(pj)/((1-pj)*(1-pi))), sqrt((1-pj)*(1-pi)/(pi*(pj)))); rmax = min( sqrt(pi*(1-pj)/((pj)*(1-pi))), sqrt(pj*(1-pi)/((pi)*(1-pj)))); label rmin="Frechet minimum Rho" rmax="Frechet maximum Rho"; run; *proc print data=min_max; run; *now get overall (collapsing treatments); proc means noprint data=pij noprint; var pij; output out=min_max2(drop=_type_ _freq_) min=min_pi max=max_pi; run; %global Max_Rho_Frechet; data min_max2; set min_max2; ** max_rho = min_pi*( 1 - max_pi )/max_pi/( 1 - min_pi ); /* max_rho="Max possible Rho given proportions" */ ** call symput("max_rho_all", max_rho); pmin=max(0, max_pi + min_pi -1); *Banadur, 1961 - logical bounds for P(Yi x Yj =1); pmax= min(max_pi, min_pi); *Banadur, 1961 - logical bounds for P(Yi x Yj =1); *Frechet bounds (1960) explained in Oman (2009); *f(u,v) = [u(1-v)/{v(1-u)}] 1/2; pi=min_pi; pj= max_pi; rmin = -1* min( sqrt(pi*(pj)/((1-pj)*(1-pi))), sqrt((1-pj)*(1-pi)/(pi*(pj)))); rmax = min( sqrt(pi*(1-pj)/((pj)*(1-pi))), sqrt(pj*(1-pi)/((pi)*(1-pj)))); label rmin="Frechet minimum Rho" rmax="Frechet maximum Rho"; call symput("Max_Rho_Frechet", rmax); ; run; %put Max_Rho_Frechet = &Max_Rho_Frechet; %**let max_rho_Frechet=%sysevalf(&Max_Rho_Frechet); *proc print data=min_max; run; *end min/max; title "Frechet bounds on possible pairwise correlations given specified marginals"; title2 "Nper=&nper rho=&rho Range of possible rho given proportions"; proc report data=min_max2 NOWINDOWS; column min_pi max_pi rmin rmax; run; /*proc means data=min_max n min max; var max_rho; class treat; run; */ %end; *end Frechet bounds; data temp2; set temp2; keep ID treat outcome y_ij wt; *keep booti rho ID treat outcome z1 z2 eps0 eps1 u1 u2 theta1 theta2 y_ij wt seed_init seed2 pt1 pt2 pc1 pc2; run; *CHECK FOR SPARSENESS in treat-outcome crossings; *ods trace on; ods listing close; proc freq data=temp2 ; table outcome*treat*y_ij; ods output CrossTabFreqs=freqs; run; proc means data=freqs min; var frequency; output out=sparse min= min_freq; run; data _null_; set sparse; if min_freq=0 then sparse=1; else sparse=0; call symput("sparse",sparse); call symput("min_freq", min_freq); run; %let sparse_ct= %sysevalf(&sparse_CT*1 +&sparse); ods listing; /*title "sparse= &sparse sparse_ct= &sparse_ct Nper= &nper rho=&rho"; PROC PRINT data= sparse; proc print data= freqs; run; */ ods listing close; * var booti rho ID treat outcome z1 z2 eps0 eps1 u1 u2 theta1 theta2 y_ij wt seed_init seed2;* pt1 pt2 pc1 pc2; *RUN; %if &sparse_ck=1 %then %let sparsetext= and &sparse =0; %else %if &sparse_ck=0 %then %let sparsetext=; *END CREATION OF SIMULATED DATA; *BEGIN ANALYSIS OF SIMULATED DATA; %if &anal=1 &sparsetext %then %do; *RUN ANALYSIS ON I-TH DATASET ONLY IF NOT TOO SPARSE FOR ANALYSIS; ods listing; *LISTING=1 MEANS RESULTS OF EVERY PROC WILL BE PRINTED; %if &listing=0 %then %do; ods listing close; %end; *CREATE variables of VECTORS FOR OBSERVED PROPORTION SUCCESS BY TREATMENT AND OUTCOME; proc sort data=temp2; by treat outcome; proc means data=temp2 noprint mean; by treat outcome; var y_ij; output out=mean mean=mean; run; data mean2; set mean; keep treat outcome mean; run; proc transpose data=mean2 out=pc prefix=pc; where treat=0; run; data pc; set pc; where upcase(_name_)= "MEAN"; proc transpose data=mean2 out=pt prefix=pt; where treat=1; run; data pt; set pt; where upcase(_name_)= "MEAN"; data pcpt; merge pc pt; *proc print data=pcpt; run; *1. COMMON TREATMENT EFFECT GEE MODEL; *first get the PI for each outcome (use noint coding); title1 "1A. GEE common effect: simulation=&SIMi nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title2 " Estimate incidence for each component (no intercept model)"; title3 " pijt= &pijt pijc= &pijc"; proc genmod data=temp2 descending; class outcome id; * /param=ref ref=first; model y_ij= outcome treat/ dist=bin type3 noint; *no intercept to estimate incidence for each compponent; repeated subject=id/ type=&rfit corrw; weight wt; *SPECIFIED COMPONENT WEIGHTS (FROM WTS=) ARE APPLIED HERE FOR COMMON EFFECT MODEL; ods output GEEEmpPEst=gcbeta; ods output GEEWCorr=wcorr; data ints; set gcbeta; where parm='outcome'; outcome=estimate; keep level1 outcome pi; pi= exp(outcome)/(1+ exp(outcome)); run; proc print data=ints; run; proc print data=wcorr; run; *SAVE PORTIONS OF CORRELATION MATRIX R; *create variables corresponding to first row of working correlation maxtrix; data wcorr; set wcorr; keep wcorr2 /*%if &rtype=1 %then*/ %do kk=3 %to &k; wcorr&KK %end; ; where rowname='Row1'; *want to save more than 1 corr for analyses by Ar1(wcorr=1) and UN (wcorr=3); %do kk=2 %to &k; wcorr&KK= col&kk; %end; run; proc print data=wcorr; run; proc transpose data=ints out=tx1 prefix=outcome; var outcome; run; proc transpose data=ints out=tx2 prefix=pi_; var pi; run; data out_pi; merge tx1 tx2; run; title "1B. GEE common effect: Simulation=&SIMI nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title2 ' Estimate common effect: need reference cell coding here'; title3 " pijt= &pijt pijc= &pijc"; proc genmod data=temp2 descending; class outcome id; *param=ref ref=first; model y_ij= outcome treat/ dist=bin type3; * noint; repeated subject=id/ type=&rfit corrw; weight wt; ods output GEEEmpPEst=gcbeta; ods output type3= gctype3; run; data gctype3; set gctype3; if source='treat' ; chi_b_gc= chisq; p_chi_b_gc= probchisq; keep chi_b_gc p_chi_b_gc; run; *Obs Parm Level1 Estimate Stderr LowerCL UpperCL Z Prob; data gcbeta; set gcbeta; keep b_gc se_b_gc z_b_gc pz_b_gc; if parm="treat" ; b_gc=estimate; *common effect log-odds ratio; se_b_gc= stderr; z_b_gc = z; pz_b_gc= probz; *WALD CHI-SQUARE; format pz_b_gc 8.7; ods trace off; run; title 'gcbeta'; *2. TEST THE TREAT-OUTCOME INTERACTION; title "2.TREATMENT-OUTCOME INTERACTION ... using GEE outcome-specific effects"; title2 " nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title3 " pijt= &pijt pijc= &pijc"; TITLE4 " default coding (not reference cell)"; *DOES NOT MAKE A DIFFERENCE IF WEIGHTS APPLIED DIRECTLY TO DISTINCT OUTCOME TREATMENT EFFECTS vs each observation; proc genmod data=temp2 descending; class outcome id; **/param=ref ref=first; model y_ij= outcome TREAT treat*outcome / dist=bin type3 noint; repeated subject=id/ type=&rfit corrw; weight wt; ods output GEEEmpPEst=INTbeta; ods output type3= INTtype3; run; DATA INTTYPE3; SET INTTYPE3; keep df_INTER chi_INTER p_chi_INTER; if source= 'treat*outcome'; df_INTER=df; chi_INTER= chisq; p_chi_INTER= probchisq; *SCORE; run; *PROC PRINT; RUN; *3a. AVG relative EFFECT: average TREATMENT EFFECT (log-odds ratio) across K estimated distinct EFFECTS; * for this model only adjust for weights in constrast statement; %macro ones(paste=1); %do oo=1 %to &k %by 1; &paste %end; %mend; title "3A. GEE outcome-specific effects: nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title2 " pijt= &pijt pijc= &pijc"; title3 ' need default coding (no reference cell)'; proc genmod data=temp2 descending; class outcome id; model y_ij= outcome treat*outcome / dist=bin type3 noint; *no treatment main effect so estimate outcome-specific effects directly via interaction; repeated subject=id/ type=&rfit corrw ecovb; ods output GEEEmpPEst=gdbeta; ods output type3= gdtype3; ods output GEErcov= rcov1; *W is count of number of DISTINCT weights given in call; %if &w=1 %then %do; contrast "Average relative effect (Score)" treat*outcome %ones(paste=&wts); *ESTIMATE Following weights must sum to 1.0; estimate "Average relative effect (Wald)--wts sum to 1" treat*outcome %ones(paste=&k_inv); **&wts); %end; %else %if &w>1 %then %do; contrast "Average relative effect (Score)" treat*outcome &wts; *ESTIMATE: the following weights must sum to 1.0; estimate "Average relative effect (Wald)--weights sum to 1" treat*outcome &wts; %end; ods output contrasts=contrasts; ods output estimates=estimates; run; *save the K treatment effect p-values for individual testing; data gdbeta; set gdbeta; where parm='treat*outcome'; run; proc means data=gdbeta min max noprint; var probz; output out= minp min= min_p_effet max=max_p_effect; run; proc transpose data=gdbeta out=t_pvals prefix=pval_effect; where parm='treat*outcome'; var probz; run; *K-DF test; data gdtype3; set gdtype3; *similar to LEFKOPOLOU K-DF TEST (1995); keep df_distinct_kdf chi_distinct_kdf p_distinct_kdf; if source= 'treat*outcome'; df_distinct_kdf=df; chi_distinct_kdf = chisq; p_distinct_kdf = probchisq; run; *AVG relative effect estimate and test; data estimates; set estimates; keep b_avg_wald se_avg_wald p_avg_wald; b_avg_wald = lbetaestimate; *this is beta for AVG rel treatment effect model (1 DF test); se_avg_wald= stderr; p_avg_wald= probchisq; *AVG rel treatment effect p-value (Wald); run; data contrasts; set contrasts; chi_avg_sc = chisq*1; p_avg_sc = probchisq*1; run; *3b. AVERAGE EFFECT WEIGHTED BY VAR-COV MATRIX; data beta2; set gdbeta; *still includes intercepts and betas; where parm ne 'Intercept' and parm ne 'outcome' and estimate ne 0; run; data rcov_b; set rcov1; row= _N_; run; %let k2=%eval(&k+2); %let k3=%eval(2*&k+ 1); data rcov_b; set rcov_b; keep prm&k2-prm&k3; where row GE %eval(&k+1); run; **proc print data=rcov_b; run; %if &runiml=1 %then %do; proc iml; /* READ COV MATRIX FROM GENMOD; **reset print; */ use rcov_b; /* * nparms= nrow(estim); *overall # parameters; */ read all into covmat; /*READ COVARIANCE MATRIX; */ use beta2; read all var{estimate} into estim; **COLUMN vector; df_k=nrow(covmat); /* * print "Covariance Matrix:" covmat; * print "estimates" estim; */ L= t(J(&k,1)); /* *L={1 1 1 1 1 1}; */ /* calculate beta average weighted by inverse covariance matrix; */ i_k= I(df_k); *create identity matrix; *1) CHOOSE WEIGHT MATRIX -- choose INV(COVMAT) or Identity (2.3.5 stats med 2010); wt=inv(covmat); *if choose this, make all weights 1 1 1 1 1 1; beta_avg2_num = L*wt*estim; *L2 is 1/k; *weight betas by inverse covariance matrix; beta_avg2_den = L*wt*t(L); *L is ones; *divide by sum of inverse var-cov; beta_wtavg = beta_avg2_num/ beta_avg2_den; * weighted average beta -- by var-cov matrix; ** alternatively, beta_wtavg = (L*wt*estim) * inv(L*wt*t(L)); var_beta_avg2 = inv(L*(wt*covmat*wt)*t(L)); *variance of test of weighted average beta=0; se_beta_wtavg = sqrt(var_beta_avg2); z_beta_wtavg = beta_wtavg / se_beta_wtavg ; *z for weighted average beta; pz_beta_wtavg = 2*(1-probnorm(abs(z_beta_wtavg))); *long form to obtain same variance as above; nvar_beta_wavg = (L* wt)* covmat * t(L*wt); dvar_beta_wavg = (L*wt*t(L)) * t( L*wt*t(L) ); var_beta_wavg = nvar_beta_wavg / dvar_beta_wavg; wald_WTavg= beta_wtavg**2/ var_beta_wavg; pwald_wtavg= 1-probchi(abs(wald_WTavg),1); *wald test of var-cov matrix weighted average relative effect; outvars= beta_wtavg || se_beta_wtavg || z_beta_wtavg || pz_beta_wtavg || wald_WTavg || pwald_wtavg; cname ={"beta_wtavg" "se_beta_wtavg" "z_beta_wtavg" "pz_beta_wtavg" "wald_WT_avg" "pwald_wtavg"}; CREATE iml_wt_wald from outvars [colname= cname]; APPEND from outvars; run; quit; %end; *************************************************; *4. COLLAPSED YES/NO OUTCOME; *collapsed composite analysis; proc sort data=temp2; by treat id; proc means data=temp2 max sum noprint; by treat id; var y_ij; output out=collapse max= anyevent sum=nevents n= nrec; run; title "4. Collapsed composite (any-vs-none) nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title2 " pijt= &pijt pijc= &pijc"; *ods trace on; proc logistic data=collapse descending; class anyevent; *param=ref ref=first; model anyevent= treat/rl; ods output ParameterEstimates =beta; ; run; data coll; set beta; if variable='treat'; b_coll=estimate; se_b_coll= stderr; chi_b_coll = waldchisq; df_b_coll= df; p_b_coll= probchisq; keep b_coll se_b_coll chi_b_coll df_b_coll p_b_coll; run; *5. COUNT OF OUTCOMES PER INDIVIDUAL AS OUTCOME; title "5A. WRS on counts: nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title2 " pijt= &pijt pijc= &pijc"; PROC NPAR1WAY DATA= COLLAPSE WILCOXON; VAR NEVENTS; CLASS TREAT; ODS OUTPUT WilcoxonTest=WRS; RUN; *proc print; run; DATA WRS; SET WRS; keep p_w_count; IF NAME1='P2_WIL'; p_w_count=nValue1*1; RUN; *ods trace on; *ods listing; title "5B. t-test on counts: nper=&nper rho=&rho max_rho_Frechet=&Max_Rho_Frechet"; title2 " pijt= &pijt pijc= &pijc"; proc ttest data=collapse; class treat; var nevents; ods output ttests= ttests; ods output Statistics=tstats; run; **proc print data=tstats; run; data tstats; set tstats; keep diff_ct se_diff_ct; where class="Diff (1-2)"; diff_ct= mean; se_diff_ct= StdErr; run; data ttests; set ttests; keep p_diff_ct; where method="Pooled"; p_diff_ct= probt; run; proc print data=ttests; run; ods trace off; *6. Individual outcome comparisons; title "6. Individual component analyses -- Proc Multtest"; title2 " pijt= &pijt pijc= &pijc"; proc multtest data=temp stepboot nsample= &multtest_b seed=&seed2 out=boot_p ; test ca(y1- y&k); class treat; run; run; proc transpose data=boot_p prefix=bootp out=bootps; var stpbootp; run; data bootp; set bootps; array bootp bootp1 - bootp&k; nsig_boot=0; do over bootp; if bootp < 0.05 then nsig_boot+1; end; run; *** proc print data=bootp; run; *PROPORTIONAL ODDS MODEL ON COUNTS; /* proc logistic data=collapse descending; class nevents/param=ref ref=first; *** weight wt; model nevents= treat/rl; ods output ParameterEstimates =betac; ; data betac; set betac; b_prop_odds= estimate; se_prop_odds = stderr; p_b_prop_odds = probchisq; keep b_prop_odds se_prop_odds p_b_prop_odds; run; */ *POOL ALL RESULTS TOGETHER; data res; merge PCPT out_pi gcbeta gctype3 INTTYPE3 gdtype3 coll wcorr wrs contrasts estimates tstats ttests t_pvals bootp %if &runiml=1 %then %do; iml_wt_wald %end;; * betac; length name $ 15; NPER=&NPER*1; n_outcomes= &k*1; rho=&rho*1; k=&k; **n_outcomes; min_pval= min(of pval_effect1 - pval_effect&k); min_bootp= min(of bootp1 - bootp&k); *boot_sig is # significnat after BS resampling and Holm-Bonf; SIMi = &SIMi*1; *simulation number; min_freq= &min_freq; *see setup -- minimum frequency within outcome within treatment; maxrho=&Max_Rho_Frechet; name=&name; format diff_ct se_diff_ct p_diff_ct /*b_prop_odds se_prop_odds p_b_prop_odds*/ b_avg_wald se_avg_wald p_avg_wald b_gc se_b_gc z_b_gc pz_b_gc chi_b_gc p_chi_b_gc chi_INTER p_chi_INTER chi_distinct_kdf p_distinct_kdf p_w_count b_coll se_b_coll chi_b_coll p_b_coll NPER rho 6.3; format df_INTER df_distinct_kdf df_b_coll 5.1 n_outcomes k nper 8.0; drop _NAME_ Contrast DF ChiSq ProbChiSq Type n_outcomes ; run; ods trace off; ods listing; **proc print data=res; * var SIMi nper p0 or or_out or_tx_out k rho wcorr b_gc se_b_gc pz_b_gc p_chi_INTER p_chi_treat_out p_b_coll; run; proc append base=&lib..&outds new=res force; run; *CREATE DATASET WITH ALL SIMULATIONS COMBINED; %let rho=%sysevalf(&rho100*1+0); %let nper=%sysevalf(&nper*1+0); proc append base=&lib..&outds.r&rho100.n&nper new=res force ;; *CREATE DATASETS FOR EACH COMBINATION OF RHO AND NPER; run; run; %end; *END ANAL LOOP; %MEND SIMMULTBIN_Or; *********************************************************************************************************;