A Response to the NINDS Study Group's "Imbalances in Baseline Stroke Severity and Outcome" paper
-----------------------------------------------------------
Note: An addendum has been added at the end of this essay on September 24th 2006.
Background:
This short piece is my response to the NINDS Study' Group's paper that was published in the April 2005 issue of the Annals of Emergency Medicine journal [1]. The NINDS Study Group felt a need to publish a paper in the Annals of Emergency Medicine to deal with criticism that baseline imbalances in stroke severity in the NINDS trial affected the correct interpretation of the trial's results. This problematic issue was first discussed in the official medical literature when my paper was published in the Western Journal of Medicine in May 2002 [2]. I implied that the NINDS trial's results for patients treated between 91-180 minutes was distorted by the fact that there were far fewer very severe (bNINHSS>20) stroke patients, and far more very mild (bNIHSS<5) stroke patients in the treated group compared to the placebo group, and I suggested that this happenstance could account for the fact that tPA was apparently more effective for patients treated between 91-180 minutes than 0-90 minutes, as was initially reported in the original NEJM report [3]. The original NEJM report stated that the OR was 2.4 for patients treated between 91-180 minutes and 1.7 for patients treated between 0-90 minutes (mRS=<1 stroke outcome scoring system). The NINDS Study Group also invented a new stroke outcome scale called the Global Statistic and they reported that the OR was 1.9 using the Global Statistic stroke outcome scoring system for patients treated between 0-90 minutes, and also 1.9 for patients treated between 91-180 minutes.
My initial criticism of the NINDS trial's analysis was handicapped by the fact that I didn't have access to the trial's raw data. I subsequently obtained a limited amount of raw data in October 2003, and my subsequent personal analysis of the raw data enabled me to estimate to what degree the "baseline stroke severity imbalance" problem could have affected the correct interpretation of the trial's results for stroke patients treated between 91-180 minutes [4]. I estimated that if one corrected for the "baseline stroke severity imbalance" problem and a chance event phenomenon affecting the NIHSS 11-15 placebo subgroup, that the "corrected" absolute risk difference would be 9-11%, and not 21% as originally reported in the NEJM in1995 (mRS<=1 stroke outcome scoring system). From a quantitative perspective, this pooled analysis "correction" is approximately 50% of the originally estimated absolute efficacy value.
How did the NINDS Study Group deal with this issue in their Annals of Emergency Medicine paper?
This is a copy of the Editor's Capsule Summary regarding the NINDS Study Group paper.
Note that the Editor implied that it was already known that the NINDS trial demonstrated that tPA produced clinically significant benefit, but he subsequently ackowledges that baseline imbalances in the severity of stroke between the tPA and placebo groups contributed to controversy about the validity of the results. The Editor then stated that this post hoc study demonstrated that there appears to be a benefit from tPA in patients with acute stroke, even accounting for differences in baseline stroke severity, and that this benefit occurs across the range of stroke severity.Note that the Editor does not quantify the degree of clinical benefit that remains after correcting for the imbalance in baseline stroke severity. Isn't that the critically important issue?
How does the NINDS Study Group present their data in their paper?
I think that figure 4 from their paper best represents their position.
Figure 4:
The NINDS Study Group obviously concede that the imbalance in baseline stroke severity mainly affected the very mild (bNIHSS<5) and very severe (bNIHSS>20) stroke severity subgroups in the cohort of stroke patients treated between 91-180 minutes, which is why they calculated the unadjusted OR for bNIHSS>5,<20 stroke patients treated beween 91-180 minutes -- note that the unadjusted* OR (line 2) is significantly less than the unadjusted OR value for all 91-180 minute stroke patients (line 8), which was originally reported as being larger than the unadjusted OR value for all 0-90 minute stroke patients (line 7).(* Note that I prefer to deal with unadjusted OR values because the adjusted OR values use statistical techniques that are not transparent and that are also not regarded as being statistically acceptable by all biostatisticians -- see the critical comments by the NINDS Reanalysis Committee in their detailed re-analysis of the NINDS trial report [5])
In their paper, the NINDS Study Group does not quantify to what degree the imbalance in baseline stroke severity problem affected the correct estimation of tPA's clinical efficacy. This is what the authors state in the results section of the paper-: "After adjustment for potential confounding variables for favorable outcome at 3 months, the ORs for a favorable outcome were greater than 1.0 for all 12 of the onset-to-treatment/NIH Stroke Scale subgroups (Figure 4), favoring rtPA in all subgroups." The authors also made the following statement in the discussion section of their paper-: "In addition, using all NINDS patients and after excluding patients with either the least severe or the most severe strokes, as defined on the NIH Stroke Scale, there is still a greater odds of a favorable outcome for tPA-treated patients (Figure 4)."
Stating that the OR for a favorable stroke outcome is greater than 1.0 across the stroke severity range (after correcting for baseline imbalances in stroke severity) does not prove that tPA provides clinically significant benefit. It only implies that tPA is more likely to produce benefit rather than harm across the stroke severity range.
What is the standard method of quantifying whether a tested drug's degree of measured efficacy is clinically significant?
A P value <0.05 only signifies a statistically significant result, and a P value <0.05 implies that if the RCT's measured OR is equal, or greater than the RCT's actual OR results, then it is very unlikely (less than 5% likely) that the null hypothesis (that there is no difference between the tested drug and control drug/placebo) is true. A P value cannot quantify the degree of efficacy and it cannot help a trial interpreter assess whether the trial's result is clinically significant (rather than statistically significant).
Sackett presents a technique for estimating whether an individual RCT's OR result is clinically significant by using 95% CIs [6].
This is a diagram (figure 6.2) from Sackett's draft chapter 6:5 [6].
Note that Sackett uses the term MIB, which stands for "minimally important benefit". The MIB threshold for clinical significance is subjectively determined by clinicians/patients and is dependent on individual clinician and individual patient values. A RCT result that is greater than the MIB threshold represents a clinically significant result, and a RCT can be deemed to have produced a superiority result if both ends of the 95%CI range extend beyond the MIB threshold line (example D in the above diagram). Example C in Sackett's diagram represents a treatment result that suggests a clinically significant benefit, but note that one cannot incontestably establish that the benefit is definitely clinically significant (clinically important) because the 95%CI range line extends beyond the MIB threshold line. Example B in Sackett's diagram represents an indeterminate result -- the 95%CI range line not only extends beyond the MIB threshold line, it also extends beyond the zero line (OR value of 1.0).
Note that the unadjusted OR 95% CI result for stroke patients (bNIHSS>5, <20) treated between 91-180 minutes in the NINDS trial is an indeterminate result according to Sackett's diagram (see line 2 in figure 4 above). In other words, one can only rationally conclude from studying the NINDS trial's unadjusted OR results for bNIHSS>5,<20 stroke patients that tPA may produce a clinically significant benefit in stroke patients treated between 91-180 minutes, but one cannot automatically conclude that a superiority conclusion is scientifically proven.
Another point that is of great importance is the fact that a single RCT's 95%CI results only represents a single snapshot-representation of a sample population of stroke patients, and it doesn't necessarily reflect what would happen to stroke patients treated with tPA in another RCT (or in the general community population of stroke patients). All the other tPA-for-stroke RCTs did not demonstrate that tPA is efficacious for stroke patients treated between 91-180 minutes. What were the efficacy results from those other tPA-for-stroke RCTs, and what factor most likely caused the major difference in tPA efficacy between the NINDS and non-NINDS randomised trials?
Because I do not have access to the raw data from the non-NINDS tPA-for-stroke RCTs, I have to use a roundabout method of calculating the point estimate OR result and the 95%CI for those non-NINDS trials.
Consider the results of the pooled analysis from the NINDS/ECASS/ATLANTIS trials which were published in the Lancet [7].
Figure 4 from the Lancet paper
The number of patients who had an excellent stroke outcome (mRS=<1) can be derived from figure 4 for stroke patients treated between 91-180 minutes -- 30% of 315 = 95 placebo patients, and 43% of 302 = 130 tPA patients. Because I know how many patients came from the NINDS trial, I can calculate the number of patients derived from the pooled non-NINDS trials. It is then possible to fill-in the data in the following table.
Trial
Number of patients treated between 91-180 minutes
Number of patients with mRS=<1
Placebo
tPA
Placebo
tPA
NINDS trial-all patients
167
153
42 (25%)
70 (45%)
NINDS trial-bNIHSS>5,<20 patients
114
96
34 (30%)
42 (44%)
Pooled non-NINDS trial patients
148
149
53 (36%)
60 (40%)
It is then possible to use these figures to calculate the *point estimate OR value and *95%CI for each trial using a CI calculator (for a mRS<1 stroke outcome result)
NINDS trial-all patients -- OR 2.5 (1.5-4.0)
NINDS trial-bNIHSS>5,<20 patients -- OR 1.8 (1.0-3.2)
Pooled non-NINDS trial patients -- OR 1.2 (0.75-1.9)
(* these figures are not precisely accurate because they were derived from figure 4, which used rounding of the figures)
Let's presume, for argument sake, that the MIB threshold value is an OR of 1.5. Then according to Sackett's diagram, the NINDS trial-all patients results would be classified as a superiority conclusion (falsely due to the baseline stroke severity imbalance problem), the NINDS trial-bNIHSS>5,<20 results would be classified as a "treatment shows benefit but cannot determine whether benefit is important" conclusion, while the pooled non-NINDS trial results would be classified as an indeterminate conclusion (and very close to a borderline equivalence conclusion).
Why did the pooled non-NINDS trials produce indeterminate/equivalence results?
Note that the percentage of tPA-treated stroke patients who had an excellent stroke outcome in the pooled non-NINDS trials (40%) was similar to the NINDS trial (45%). However, the placebo response rate was far less in the NINDS trial (25%) compared to the pooled non-NINDS trials (36%). The above table also demonstrates that the NINDS trial's placebo reponse rate would be 30% if one only included bNIHSS>5,<20 stroke patients. Another distorting factor is that the NINDS bNIHSS11-15 subgroup had an inordinately low 14% rate of excellent stroke outcome instead of an "expected" ~30% excellent stroke outcome rate -- presumably due to chance events (see reference [8] for further details regarding this confounding factor). Correcting for that confounding factor would add another ~2%, so the "corrected" NINDS placebo response rate could be perceived to be ~32%.
I cannot estimate to what degree imbalances in baseline stroke severity and other chance event factors affected the pooled non-NINDS trials' results -- because I do not have access to the raw data. The pooled non-NINDS trials could have confounding variables that disfavor the tPA group, and the true efficacy of tPA may be intermediate between an OR point estimate value of 1.2-1.8. However, in the absence of public access to the raw data, I regard this theoretical possibility as being presently unknowable.
Concluding remarks:
For stroke patients who are treated with tPA between 90-180 minutes, there is no RCT evidence that unequivocally proves that tPA produces a clinically significant benefit (superiority conclusion). The NINDS trial (after correcting for baseline imbalances in stroke severity) only suggests that a clinically significant benefit is possible. Unfortunately, the non-NINDS randomised trials produced an indeterminate/equivalence result, and therefore there is no confirmatory evidence to solidify one's hope that tPA will regularly produce a clinically significant benefit if it is routinely administered to stroke patients 91-180 minutes after stroke onset.There is substantial evidence to suggest that tPA is beneficial in individual acute ischemic stroke patients as a result of an increased vessel recanalisation rate [9]. However, there is not a direct, and linear, relationship between the rate/degree of recanalisation and the likelihood of a favorable stroke outcome. Many confounding factors may influence this relationship -- time-to-treatment and variable efficacy of the collateral circulation which has to sustain the ischemic penumbral tissue while awaiting recanalisation; location and type and extent of the occluding clot and whether partial recanalisation results in clot fragments that subsequently occlude smaller branch vessels; rate and degree of re-occlusion following initially successful recanalisation; rate and degree of reperfusion injury [9]. When comparing the results of different tPA-for-stroke RCTs, one may automatically presume that these confounding factors affect the different RCTs to the same degree, but this is unlikely to happen if the individual RCT's sample size is too small (few hundred patients). This heterogeneity problem is a major confounding variable in tPA-for-stroke RCTs that only recruit a few hundred patients who are very heterogenous with respect to stroke location, stroke type, and stroke severity. This problem also affects RCTs that test neuroprotective agents, and it is an inescapable fact-of-life that one cannot hope to obtain scientifically conclusive results from a stroke-RCT if the sample size if too small and the stroke heterogeneity problem too large.
In the same issue of the Annals of Emergency Medicine, the editorialists [10] state that "the efficacy of tPA (the observed effect of an intervention under ideal circumstances in a controlled clinical trial) may be better than its effectiveness (the outcomes seen in everyday practice)". That possibility would not be unexpected, because it is a common phenomenon that applies to all fields of clinical research. The editorialists therefore bring up the possibility of performing pragmatic/practical trials of tPA in a more diverse population of stroke patients in a wider variety of acute care settings. However, the editorialists acknowledge that trial interpretative problems will occur due to smaller effect sizes and increased variability in patient heterogeneity, different trial sites and variable use of co-interventions. I suspect that pragmatic/practical tPA-for-stroke trials could only produce scientifically conclusive results if they enrolled tens-of-thousands of patients, and I don't expect that the stroke research community will undertake such a gigantic task if they are presently unwilling to run traditional RCTs of adequate sample size (few thousand patients). I believe that the stroke research community first has to prove that tPA will produce a clinically significant benefit (superiority conclusion) in a homogeneous population of stroke patients under ideal conditions, and I believe that it is premature to explore the utility of pragmatic/practical tPA-for-stroke trials which will likely be plagued by a low signal/noise ratio and scientifically inconclusive results.
Jeff Mann, MD.
Retired physician.
Date of first draft: August 2005.
jmannemg@earthlink.net
Addendum added September 24th 2006.
Many people regard me as a maverick trial interpreter who doesn't publish his opinions in peer-reviewed journals, and many people therefore heavily discount the accuracy and scientific validity/rationality of my arguments. I believe that my arguments should primarily be judged on their scientific merits, and I believe that the scientific validity of my arguments should not depend on whether my opinions/conclusions are approved by the "high priests" of medicine (professional associations like the AHA, American Stroke Association and the ANA) and/or the general public. I have been criticised by many people, who frequently rely on emotional ad hominum attacks, and who avoid undertaking the difficult task of addressing the numbers/rationale underlying my arguments. Consider the following example of an ad hominum attack, which doesn't address the issues, and which merely relies on emotional side-tracking tactics that bypass the central issues.
I wrote a letter to the Editor of the Annals of Emergency Medicine, and my letter was published in the October 2006 issue of the Annals of Emergency Medicine (see between the horizontal lines). My letter was a response to a letter by Walsh and Fromm, which was originally published in the April 2006 issue of the Annals of Emergency Medicine.
To the Editor:
In their letter to Annals of Emergency Medicine the authors, Walsh and Fromm, lament the fact that a survey of emergency physicians demonstrated that 40% of respondents are unlikely to use tPA therapy for acute ischemic stroke because of a concern for a lack of benefit and a risk of symptomatic intracranial hemorrhage.1 Walsh and Fromm state that tPA is standard therapy and they question the legitimacy of any dissenting opinion when they state, “If such a therapy is so routine so as to fall under the doctrines of emergency treatment and implied consent, how can these emergency physicians deny what has become ordinary local standard of care? Are they waiting for a future study that will refute the National Institute for Neurological Disorders and Stroke (NINDS) data?”
What Walsh and Fromm do not acknowledge is that a study refuting the efficacy results of the NINDS study already exists - the pooled results from the European Cooperative Acute Stroke Study (ECASS)/European Cooperative Acute Stroke Study II (ECASS II)/Altepase Thrombolysis for Acute Non-Interventional Therapy in Ischemic Stroke (ATLANTIS) trials. Walsh and Fromm state that the recently published post hoc analysis of the NINDS data in the April 2005 Annals of Emergency Medicine confirms the initial hypothesis of the NINDS trial.2 However, my personal analysis of that paper demonstrates that if one corrects for baseline imbalances in stroke severity, by only considering patients with a stroke severity NIHSS score of >5,<20, then the overall efficacy of tPA in patients treated between 91-180 minutes decreases from an OR value of 2.5 (1.5-4.0) to an OR value of 1.8 (1.0-3.2).3 More importantly, if one performs a pooled analysis of all the other major tPA-for-stroke randomized trials (ECASS/ECASS II/ATLANTIS) the OR value for patients treated between 91-180 minutes was only 1.2 (0.75-1.9) and the lower limit of the 95%CI extends into negative territory.3 Both the NINDS trial and non-NINDS trials (pooled results from the ECASS/ECASS II/ATLANTIS trials) had an equal sample size with respect to the number of patients treated between 91-180 minutes (NINDS trial - 320 patients; non-NINDS trials - 297 patients). Therefore, the burden of proof lies with tPA proponents, such as Walsh and Fromm, to explain why these differences exist and why the results from the NINDS trial should trump the results from the non-NINDS trials.
References:
1 M. Walsh and G. Fromm, Emergency physician survey: recombinant tissue plasminogen activator for stroke, Annals Emerg Med 47 (2006), pp. 296–297.
2 T. Kwiatkowski, R. Libman and B.C. Tilley et al., The impact of imbalances in baseline stroke severity on outcome in the National Institute of Neurologic Disorders and Stroke Recombinant Tissue Plasminogen Activator Stroke Study, Ann Emerg Med 45 (2005), pp. 377–384.
3 Mann J. A Response to the NINDS Study Group’s “Imbalances in Baseline Stroke Severity and Outcome”. Available at http://jeffmann.net/soapbox/NINDS-ResponsetoAnnalsPaper.htm. Accessed August 9, 2006.
The primary function of my letter was to address a simple fact. When one attempts to determine the efficacy of tPA in acute ischemic stroke, by reviewing all the available RCT-evidence, then one needs to look at all the RCT-evidence and not only the results of a single randomised trial. Walsh and Fromm stated in their original letter to the Annals of Emergency Medicine, that tPA represents the standard of care for the treatment of acute ischemic stroke, and they couldn't understand why so many Emergency Physicians were questioning the validity of tPA's efficacy. Walsh and Fromm rhetorically asked "if such a therapy is so routine so as to fall under the doctrines of emergency treatment and implied consent, how can these emergency physicians deny what has become ordinary local standard of care? Are they waiting for a future study that will refute the National Institute for Neurological Disorders and Stroke (NINDS) data?"
In my letter, I merely pointed out a simple fact -- that the NINDS trial is the only RCT that has demonstrated that tPA has clinically significant efficacy in acute ischemic stroke (for patients treated between 91-180 minutes) and that all the other non-NINDS trials had borderline negative/equivalance results (for patients treated between 91-180 minutes), and that it was incumbent on tPA proponents, like Walsh and Fromm, to explain why these differences exist and also explain why the results from the NINDS trial should trump the results from the non-NINDS trials.
Consider how Walsh and Fromm responded to my letter, by reading their reply letter that was also published in the October 2006 issue of the Annals of Emergency Medicine (see between the horizontal lines).
In reply:
Mark Walsh MDa, Gary Fromm MDa and Michael Donnino MDb
aDepartments of Critical Care and Emergency Medicine Memorial Hospital South Bend, IN
bDepartments of Critical Care and Emergency Medicine Beth Israel Deaconess Medical Center Harvard School of Medicine Boston, MADr. Mann responds to our concern that 40 percent of surveyed ACEP physicians would not provide rtPA to patients who meet established criteria for treatment within three hours of stroke and that we have ignored the findings and comparisons of the pooled NINDS and non NINDS trials. He refers us to his personal analysis (his term) of data that he has pooled from different trials and published on his Web sites for confirmation of his opinions. Dr. Mann has made familiar and similar objections to the NINDS post hoc analysis in a letter to the editor which appeared in Stroke.1 The published response by the independent committee of scientists, statisticians and clinicians who reviewed the NINDS data describes the problem inherent in Dr. Mann’s selective mining of subgroup data from different studies: “subgroup analyses need to be interpreted cautiously with the burden of proof resting on making the case that the treatment effect is different between subgroups … (and) must be interpreted very cautiously because the erroneous identification of differential subgroup effects may lead to inappropriate provision, or withholding, of treatment.”2 This is an authoritative warning that the burden of proof lies with the 40% of ACEP survey responder skeptics who would deny therapy to patients meeting indications. Emergency physicians, as referenced in our letter, suffer the stigma as unable or unwilling to treat ischemic strokes with rtPA. The American Heart Association supports the use of rtPA. The American Stroke Association supports the use of rtPA. The Joint Commission on Accreditation of Healthcare Organizations (JCAHO) holds hospitals to set standards and rewards those with excellence based on, among other clinical benchmarks, the ability to give rtPA for ischemic stroke. The data, as analyzed by the original NINDS investigators, support the use of rtPA. The data, as reanalyzed by a separate and independent group of expert clinicians and statisticians, support the use of rtPA. Thus, Dr. Mann’s objections, which are based on his own personal analysis, stand in contrast to expert clinicians, statisticians and medical societies spanning multiple specialties. We believe that the preponderance of evidence available at this time supports the use of rtPA and would caution against personal subgroup analysis of this type that has not withstood the scrutiny of peer review. The enemy of good is perfect. The independent review group which performed the post hoc NINDS analysis cautioned Dr. Mann in 2005: “Although no statistical analysis can definitively rule out subgroup differences, passionate belief in a personal perspective will not change the biology.”2 This group’s voice seems one of reason.
References:
1. J. Mann, NINDS reanalysis committee’s reanalysis of the NINDS trial [letter], Stroke 36 (2005), p. 230.
2. T.J. Ingall, W.M. O’Fallon and K. Asplund et al., NINDS reanalysis committee’s reanalysis of the NINDS trial [response], Stroke 36 (2005), pp. 230–231.
Consider the diversionary nature of the Walsh/Fromm/Donnino reply. They didn't address the central question of the vast difference in efficacy-results between the NINDS and non-NINDS trials (the primary problem-issue that I posed). Instead, they merely deflected the reader's attention away from my central argument to a letter that I wrote to the Stroke journal, which they claim (incorrectly) was related to a familiar argument based on similar grounds, and they specifically stated that I was guilty of the "selective mining of subgroup data from different trials". The truth is that my letter* to Stroke was primarily related to the NINDS trial and had nothing to do with the non-NINDS trials, and the authors are deliberatedly (or unknowingly) mixing up two different issues.
(* my letter to Stroke was primarily related to the topic of the different methods of analysing a RCT's results - a stratified versus non-stratified analysis -- and my detailed analysis of this particular issue is available in reference number 8)
Have I selectively mined sub-group data from different trials in order to foster a selectively distorted conclusion? Consider the facts.
In this essay, I dealt with the 91-180 minutes results from the NINDS and non-NINDS trials. Why didn't I include other time frames for comparison? The answer is simple. I didn't include the 0-90 minute time frame for two reasons -- i) the NINDS trial was the only trial that studied a significant number of patients treated between 0-90 minutes, and the total number of treated/placebo patients was small (<300 patients); ii) in clinical practice, it is unlikely that stroke patients can be treated in <90 minutes. I also didn't make comparisons for the time frame >180 minutes (181-360 minutes) for two reasons -- i) tPA is not approved for use beyond 180 minutes; ii) the NINDS trial did not include patients treated after 180 minutes.
In other words, I could only make comparisons for the clinically relevant 91-180 minute time frame.
Did I selectively use subgroup data in my comparison in order to produce a selectively distorted comparative result? I did not. Consider, yet again, the real facts.
I utilised three group results when making a comparison of the RCT results for stroke patients treated between 91-180 minutes -- i) All the NINDS patients (irrespective of stroke severity subgroup); ii) all the non-NINDS patients (irrespective of stroke severity subgroup), and iii) the NINDS intermediate stroke severity subgroup patients (bNIHSS stroke severity >5,<20). Did I selectively mine the subgroup data in order to produce a distorted comparative result? The answer is obviously negative, because it was the NINDS Study Group investigators who first used the NINDS intermediate stroke severity subgroup (bNIHSS stroke severity >5,<20) data in their paper [1] in an attempt to demonstrate that tPA is still clinically efficacious, even if one eliminates the two subgroups (bNIHHSS<5, >20) that were primarily responsible for the stroke severity imbalance in the NINDS study. I merely took the NINDS Study Group Investigators' original comparison one step further by numerically calculating the efficacy of tPA for all patients (OR 2.5) compared to the intermediate stroke severity subgroup patients (OR 1.8) alone. The primary purpose purpose of this numerical comparison was an attempt to semi-quantitatively estimate to what degree the "stroke severity imbalance" problem affected the accurate estimation of tPA's efficacy in the 91-180 minute arm of the NINDS study.
Most importantly -- even if one ignores the NINDS intermediate stroke severity subgroup (NINDS-bHISSS>5,<20) results in my three group comparative analysis, one still needs to account for the magnitude of the difference between the NINDS trial's overall results (OR 2.5) and the non-NINDS trials' overall results (OR 1.2), and one also needs to explain why the NINDS trial's results should trump the results from the non-NINDS trials.
Walsh/Fromm/Donnino didn't address this central question in their reply and they merely attempted to perform a feat of one upmanship by comparing the so-called "authoritative" beliefs of the NINDS Reanalysis Committee and other professional organisations, like the AHA/American Stroke Association, to the passionate, non-peer reviewed, beliefs of a solo trial interpreter. It reminds me of the comparison between the "authoritative" beliefs of the high priests of the Catholic Church (who believed that the earth was flat) and Galileo (who believed that the earth was round)! I think that tPA proponents, like Walsh and Fromm, are like FlatEarthers -- because they refuse to consider the totality of the scientific evidence, and they willfully, and blindly, adopt an incomplete version of the RCT evidence. It remains an incontestible, and lamentable, fact that tPA proponents, like Walsh and Fromm, are still not offering to explain why there is a vast difference in efficacy results between the NINDS and non-NINDS trials, and why it is scientifically acceptable to ignore the borderline negative/equivalence results of the non-NINDS trials and only trumpet the questionably accurate results of the NINDS trial. Also, the NINDS Study Group investigators obviously agree that the NINDS trial's 91-180 minutes efficacy results have to be adjusted for the "imbalance in baseline stroke severity" problem-issue, which is the primary reason why they wrote their paper, but they have still avoided making a semi-quantitative assessment of the magnitude of the problem. They merely concluded that tPA still has efficacy (OR >1.0) after taking the "imbalance in baseline stroke severity" problem-issue into account, and they did not attempt to definitively prove that the remaining degree of efficacy represents a clinically significant result (an OR result that definitely exceeds a minimally important benefit threshold).
Jeff Mann, MD.
First draft of the addendum: September 24th 2006.
References:
1. Thomas Kwiatkowski, Richard Libman, Barbara C. Tilley, Christopher Lewandowski, James C. Grotta, Patrick Lyden, Steven R. Levine, Thomas Brott and the National Institute of Neurological Disorders and Stroke Recombinant Tissue Plasminogen Activator Stroke Study Group. The Impact of Imbalances in Baseline Stroke Severity on Outcome in the National Institute of Neurological Disorders and Stroke Recombinant Tissue Plasminogen Activator Stroke Study. Annals of Emergency Medicine. Vol 45. Issue 4. April 2005. p377-384.
2. Mann J. Truths about the NINDS study: setting the record straight. West J Med. 2002;176:192-194.
3. The National Institute of Neurologic Disorders and Stroke rtPA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995;333:1581-1587.
4. Mann J. A personal analysis of the NINDS study using patient level data.
Available at http://jeffmann.net/soapbox/NINDSpersonalanalysis.html.
5. O’Fallon WM, Asplund K, Goldfrank LR, Hertzberg VS, Ingall TJ, Louis TA. Report of the t-PA Review Committee. National Institute of Neurologic Diseases and Stroke; 2004. Available at: http://www.ninds.nih.gov/t-PA_review_Committee
6. Sackett D. From chapter 6:5 (Superiority, Equivalence and Non-inferiority Trials) drafted for the third edition of "Clinical Epidemiology", which is due to be published in the near future.
Available at http://nsite.ca/index.cfm?page=221&CFID=35490&CFTOKEN=18728222
7. ATLANTIS-ECASS-NINDS rt-PA Study Group Investigators. Association of Early Stroke Outcome with Early Stroke Treatment: Pooled Analysis of ATLANTIS, ECASS, and NINDS rt-PA stroke trials. Lancet 2004;363:768-74.
8. Mann J. A Critique of the Re-analysis of the NINDS Trial.
Available at http://jeffmann.net/soapbox/IngallCrtitique.htm
9. Mann J. Determining the efficacy of thrombolytic therapy in acute ischemic stroke: An analysis of the recent stroke literature.
Available at http://jeffmann.net/soapbox/EfficacyofTPA.htm
10. David Magid, Nils Naviaux and Robert L. Wears. Stroking the data: Re-analysis of the NINDS trial. Annals of Emergency Medicine. Vol 45. Issue 4. April 2005. p 385-387.