An analysis of the WHS trial of aspirin therapy for the primary prevention of cardiovascular events in women


-------------------------------------------------------------------

An adobe PDF version of this essay is available at http://jeffmann.net/soapbox/Aspirin-WHSanalysis.pdf


Introduction
:


This analysis of the WHS trial is part of my ongoing series of personal analyses of RCTs that intentionally focuses on major weaknesses in the design and interpretation of large-scaled RCTs.

The purpose of a RCT is to dermine the scientific truth, and as I have repeatedly emphasised in my previous trial analyses [1] [2], a RCT can only determine the scientific truth with a high degree of confidence if the trial has a high signal/noise ratio. The WHS is yet another RCT that probably has a low signal/noise ratio, which means that it cannot hope to confidently generate scientifically valid results (trial results of a significant magnitude that are incontestably due to the tested drug and not due to noise factors).

In this analysis, I will dissect the strong and weak points of the WHS trial's design and official interpretation, and I will compare it to another recent trial that was also handicapped by having a low signal/noise ratio (but for different reasons) [1]. Before I demonstrate major weaknesses in the design and interpretation of the WHS trial, I will first tackle a side-issue; the effect of year-to-year chance events on a RCT's results.
 

Analysis of the WHS trial:
 

To quote directly from the WHS trial's official report [3]-:

"The Women’s Health Study was a large randomized, double-blind, placebo-controlled trial of low-dose aspirin in the primary prevention of cardiovascular disease among 39,876 apparently healthy women followed for a mean of 10 years for the major cardiovascular events of myocardial infarction, stroke, and death from cardiovascular causes. The primary end point was a combination of major cardiovascular events, including nonfatal myocardial infarction, nonfatal stroke, and death from cardiovascular causes, and the trial was initially designed to have a statistical power of 86 percent to detect a 25 percent reduction in this end point."


Issue number 1:  Year-to-year chance events in the WHS study.


There are two aspects of the WHS trial's design that were much better than average. First of all, the sample size was relatively large; there were ~19,000 patients in each group (aspirin group and placebo group). Secondly, the study duration was approximately 10 years, which is much longer than usual. The advantage of the combination of a large sample size and a long duration study period is considerable; it minimises noise due to random chance events that often plague smaller trials of shorter duration (eg. the APPROVe trial [1]).

To demonstrate the fact that year-to-year chance events played a smaller-than-average role in this trial, let's consider certain facts.

First of all, consider the baseline characteristics of the patients enrolled in the WHS trial.

Table 1 from reference number [3].
 


 

Note that two groups were reasonably balanced at baseline in terms of prognostic variables, per the usual standards of general acceptability. However, the information in this table only represents a crude measure of the effectivenss of randomisation in creating two groups that have the same baseline risk of a future cardiovascular event. I think that examining the control event rate pattern over time sometimes offers a trial interpreter a much better idea of whether the two comparative groups were likely to be in state of prognostic balance for the entire duration of the trial.

Consider the Kaplan Meier curves for cumulative cardiovascular events (from reference number [3]).

Figure 1: Cumulative incidence rates of the primary endpoint of major cardiovascular events [3].
 


 

I think that the placebo group's cumulative control event curve is a textbook example of a good quality cumulative control event incidence rate curve from a "minimisation of year-to-year chance event variability" perspective -- note that the curve is straight and linear, and note that there are no major step deformities (sudden increases or sudden decreases in slope angle of the placebo group curve) that would suggest that significant noise, due to random yearly chance events, is present. The patients enrolled in the WHS trial had a very low risk of a control APTC event (major cardiovascular event) and most of the enrolled patients had a <2% risk of a control event during the 10 year study period. There is no practical method of predicting when those control events would occur during the 10 year study period, and theoretically they should be evenly dispersed throughout the 10 year time period by *chance alone (if the trial's sample size is sufficiently large).

(* Theoretically, if all prognostic variables remain unchanged during the 10-year study period, then there should be a slightly higher incidence of APTC events in the last few years of the trial -- because the patients are aging by 10 years during the study period, and age is a known major risk factor for a future major cardiovascular event. However, this phenomenon didn't occur in the WHS trial, presumably because ~60% of the enrolled patients were between 45-55 years of age, and female patients in that age range have a very low 10-year risk of an APTC event that doesn't change very much from year-to-year in that age range)  

To demonstrate that the placebo group's control event rate was evenly distributed throughout the trial's study period, consider the following graphical display.

Figure 1: Cumulative incidence rates of the primary endpoint of major cardiovascular events [3].
 


Note that I have hand-drawn three lines (blue, red and green) at 2 year intervals.

"X" (on the Y axis) represents the cumulative control event incidence rate for the first two years of the trial. It is easy to see that the control event incidence rate is roughly the same for each two year time period (0-2 years, 2-4 years, 4-6 years), and that the control event incidence rate doesn't change significantly with time. This fact suggests that the WHS trial had a low noise level due to chance events. This is critically important because the aspirin group's cumulative event incidence rate is compared to the placebo group's cumulative event incidence rate, and if the placebo group had a large chance variation in the control event rate, then that chance variation would artefactually inflate (or deflate) the signal (difference in cumulative incidence of major cardiovascular events between the placebo and treatment groups) if a proportional chance variation of the same direction/magnitude didn't also occur in the treatment group. A large chance variation actually occurred in the APPROVe trial's placebo group [1] and this chance event phenomenon distorted the APPROVe trial's signal by markedly inflating the magnitude of the signal.

Consider the Kaplan Meier curves for the cumulative event incidence rate for the APPROVe trial (from reference number [4]).

Figure 2: Cumulative incidence of confirmed serious thrombotic events (from reference number [4]).
 


 

Note the marked flattening of the placebo group's cumulative event incidence rate curve after 18 months. There is no pathophysiological reason why that phenomenon should have occurred, because both the WHS and the APPROVe trials enrolled low risk patients who had a low risk of a future major cardiovascular event, and theoretically the CV outcome events should be more-or-less evenly dispersed over the entire time duration of the trial .

However, the APPROVE trial only enrolled ~1200 patients in each group, and the trial duration was only 3 years -- and it is a well known fact that a small sample-sized, short duration trial is much more susceptible to chance events that can significantly distort the trial's signal. The APPROVe trial was also plagued by a high drop-out rate of 40%, and a large differential drop-out rate could theoretically also aggravate a trial's chance event noise factor (see reference [1] for a detailed analysis of chance events in the APPROVe trial, and their potential effect on the trial's signal/noise ratio).

In contrast to my conclusion regarding the APPROVe trial, I am generally concluding that the WHS trial was apparently less affected by chance events from year-to-year, and I cannot readily identify significant noise due to year-to-year chance events (as apparently occurred to a marked degree in the APPROVe trial).

Does that mean that I believe the WHS trial had a high signal/noise ratio? Not at all! The WHS trial had such a small signal overall (for the entire trial period) that it wouldn't take much noise to drive the signal/noise ratio to a very low level.
 

Issue number 2:  Low risk patients who received low dose aspirin therapy.
 

In my introduction, I implied that the WHS trial had a low signal/noise ratio. However, if I cannot demonstrate that the WHS trial had a high noise level due to year-to-year chance event variability factors, then I have to demonstrate that the magnitude of the signal was disproportionately small if I am going to suggest that the trial's signal/noise ratio was low due to the presence of small amounts of chance event noise in the presence of an even smaller sized signal.

According to Sackett [5], there are four major determinants that determine the size of the signal in a RCT -- they are i) the "baseline" or control group's risk of an outcome event; ii) the responsiveness of experimental patients to that treatment; iii) the potency of the experimental treatment; and iv) the completeness with which outcome events are ascertained and included in the analysis.

Sackett stated that in order for a RCT to generate a sufficiently large signal, a trialist should i) selectively enroll "high risk" patients; ii) selectively enroll "highly responsive" patients who are more likely (than average) to respond to the treatment; and iii) use a potent experimental treatment and give it time to exert its effect.

The WHS trial failed to meet "Sackett's criteria" needed to generate a sufficiently large signal -- because the trialists enrolled very low risk patients, who were not very likely to be highly responsive to aspirin therapy, and because they used a low dose of aspirin. The trialists did give the drug time to exert its effect and they should be commended for running a long duration trial (10 years).

How low risk were the WHS trial's enrolled patients?

Here is a section of table 3 from the WHS trial's official report [3].
 


Note that 60% of the enrolled women were between the age of 45-54 years (24,025 out of 39,876 patients). Note that were only 324 major CV events in this major subgroup of patients (163 events in the aspirin group, and 161 events in the  placebo group). That works out to 1.35%  (324/24025) over 10 years. That means that the "average" yearly event rate was 0.13%, which means that approximately one-in-a-thousand women (aged 45-54 years) was likely to have a major cardiovascular event each year, and they constituted 60% of the enrolled patients.   

A control event rate of 0.13 events/100 patient years is considerably less than the APPROVe trial's "average" control event rate value of 0.54 events/100 patient years.

Think about the implications of a control event rate of 0.13 events/100 patient years (1.35% over 10 years). If only 1.35% of enrolled patients (aged 45-54 years) are likely to have an outcome event over a time period of 10 years, then aspirin (if it is efficacious) can only influence ~1% of the enrolled patients (aged 45-54 years) during the entire 10 year time period! If, by chance, the 1-out-of-every100 patients (aged 45-54 years), who was doomed to be the one patient to have a major cardiovascular event, didn't take her *aspirin regularly or was aspirin-resistant, then a therapeutic opportunity would be lost, and the overall signal would be significantly smaller because there were very few opportunities for aspirin to be therapeutic in the entire 45-54 year old subgroup of patients.

(* Interestingly, when reviewing the WHS trial's official report [3], I could not find any information on "what percentage of enrolled aspirin patients" actually took their medicine for the entire 10 year duration of the study", or any information on the drop-out rate) 

Chance events obviously have a larger distorting effect when the control event rate is low and the magnitude of the anticipated signal small. Note that in the 45-54 years placebo subgroup patients, 56 patients out of 12,005 placebo patients had a MI in the 10 year period. That works out to 0.46% of the placebo patients. The value for the 45-54 years aspirin subgroup patients was 69 out of 12,000 aspirin patients, which works out to 0.57% of the aspirin patients. Let's presume, for argument sake, that aspirin had no therapeutic/harmful effect in the patients aged 45-54 years in the WHS trial. That means that the difference in event rate between the placebo group and the aspirin group (0.46%/0.57%) over a 10 year time period must have been due to chance events (despite an optimum randomisation process, a large sample size, and a long duration study). That works out to 0.13% over 10 years, and this value represents *noise due to chance events.

(* I am presuming, for argument sake, that the aspirin group is equivalent to a placebo group in this example. Although not entirely accurate, I think that it is reasonable to presume that chance events could potentially cause chance event variations of this level of magnitude between two randomised groups in a WHS-type RCT consisting of ~12,000 patients, aged 45-54 years, studied for 10 years)

What was the size of the expected signal -- what did the trialists hope to demonstrate in the WHS trial? The trialists stated in their official report that the trial was designed to detect a difference of 25%. That means that the trialists hoped to demonstrate that *aspirin would reduce the risk of MI by at least 25%.  

(In primary studies of aspirin in men, aspirin had been shown to reduce the incidence of MI by roughly ~25-30%, with no beneficial effect on stroke, and one presumes that the WHS trialists expected to see a similar magnitude of aspirin efficacy in women in the WHS trial)

The average risk of a MI in the 45-54 year control subgroup patients was ~0.5% over a 10 year time period. If aspirin actually reduces that risk by at least 25%, then the anticipated absolute size of the signal would be 0.125% over 10 years. That means that the absolute size of the anticipated signal would be no larger than the magnitude of *potential noise due to chance events, which means that the WHS trial could possibly be doomed to have a low signal/noise ratio even before the trial started, which means that the WHS trial would produce scientifically inconclusive results in those low risk patients.

(* I use the term "potential" because a trialist doesn't know to what degree his particular RCT will be affected by chance events; the chance event noise factor could be significantly smaller/greater than this "illustrative" value derived from the WHS trial) 

Think about the fundamental purpose of a RCT as a scientific testing tool in clinical research. A drug-RCT is primarily designed to prove that the signal (ARR) is entirely due to the tested drug, and not due to chance events or other confounding variables (noise). In other words, the trialists have to ensure, when designing a RCT, that the trial has a high signal/noise ratio if they want to confidently claim that their trial produced scientifically conclusive results. If the WHS trialists were anticipating a signal of a certain magnitude "x", then they had to ensure that the magnitude of the noise due to chance events (and other factors) is much smaller than "x" if they wanted to produce scientifically conclusive results. Is that practically feasible when enrolling low risk patients who only have a control outcome event rate of 0.5% over 10 years, and an anticipated signal magnitude of 0.125% over 10 years? For the WHS trialists to produce scientifically conclusive results, they would have to guarantee that the magnitude of the noise due to chance events would be much less than 0.125%. I don't think that the WHS trialists could guarantee that their trial would have such a low absolute chance event noise level, and one has to question the social value of performing a RCT in such low risk patients (who represented 60% of the study's enrolled patients) if the trialists couldn't guarantee a sufficiently low absolute chance event noise level.

I have argued that the WHS trial was possibly doomed to produce scientifically inconclusive results because the WHS trialists could not hope to guarantee that the chance event noise level in their very low risk patients would be sufficiently low (low enough to ensure that the trial's signal/noise ratio remained high). Surely, the signal/noise ratio problem is compounded if the trialists also do not ensure that the magnitude of the anticipated signal is maximised by using a potent experimental treatment (as recommended by Sackett). The WHS trialists used an aspirin dose of 100mg of aspirin every other day. Does that represent a potent dose of aspirin? Can the WHS trialists rationally argue that they used a sufficiently high dose of aspirin; a dose of aspirin sufficiently potent that it would maximise the size of the signal?

In their offical report [4], the WHS trialists made the following surprising statement-: "The trial was designed to evaluate the lowest dose of aspirin that would have a cardioprotective effect ---". If the WHS trialists' primary intention was to determine the lowest dose of aspirin that could have a cardioprotective effect, then surely the trialists needed to perform a RCT that used variable doses of aspirin after first proving that a high dose of aspirin would have a cardioprotective effect. How can one rationally design a high signal/noise ratio RCT to determine the lowest dose of aspirin that could be cardioprotective in women if there is no prior RCT evidence that a high dose of aspirin is cardioprotective in women?
 

 Issue number 3:  The clinical significance of the reduction in stroke events in the WHS trial 


The WHS trial demonstrated that aspirin reduced the risk of stroke in women, and the results demonstrated that the beneficial effect occurred in all three age subgroups (45-54 years; 55-64 years; >65 years) to a relatively uniform degree (RR ~0.8).

Consider the WHS trial's results (from reference number [3])
 


Note that aspirin significantly reduced the risk of ischemic stroke -- RR was 0.76. Note that the P value was 0.009 and the 95% CI was 0.63-0.93. Those values suggest that one can have a high degree of confidence in the trial's results. However, consider this "confidence" issue from the perspective of the trial's potential signal/noise ratio.

The absolute size of the signal (reduction in the ischemic stroke event rate over 10 years) can be calculated as follows-:

Event rate in the placebo group (221/19,942) - event rate in the aspirin group (170/19,934) = 1.1% - 0.85% = 0.16%.

The absolute size of the signal was an absolute risk reduction of ischemic stroke of 0.16% over 10 years.

To calculate the WHS trial's signal/noise ratio, one needs to know the noise level due to chance events, which is unknowable. However, imagine a hypothetical situation whereby a WHS-type trial consisting of 100,000 patients was divided into five substudies and that each substudy was identical in design to the WHS study. One would naturally expect that there would be a slight chance variation in ischemic stroke control event rates between the five substudies. The degree of chance variation in ischemic stroke control event rates represents potential noise due to chance events. I could imagine that the magnitude of the potential chance event noise factor could be close to a value of 0.16% over 10 years. Therefore, the WHS trial can only be considered to have a high signal/noise ratio if the magnitude of the actual chance event noise level in the WHS trial was significantly less than 0.16% over 10 years. However, how can the WHS trialists prove that the WHS trial actually had a sufficiently low ischemic stroke chance event rate noise level that was significantly less than 0.16% over 10 years?

In a heartwire news report [6], Dr Noel Bairey-Merz (Women's Health Resource Center, Cedars Sinai Hospital, Los Angeles), who has worked as a consultant for Bayer, says: "What we saw was relatively young, relatively low-risk women producing results consistent with what we know about risk. But it's reassuring that we did see a benefit in older women.---"Also, I am personally encouraged by the fact that we saw a benefit in stroke, because women suffer disproportionately from stroke," she adds. "Stroke was significantly reduced for the entire population, and that's an effect you can count on. I think the stroke thing is kind of exciting. I'm so glad it wasn't a totally negative trial."

Note that Dr Noel Bairey-Merz stated that "stroke was significantly reduced for the entire population, and that's an effect you can count on". How can one rationally count on that effect if one doesn't first prove that the WHS trial had a high signal/noise ratio with respect to ischemic stroke events?


Issue number 4
:  The clinical significance of the reduction in major cardiovascular events in elderly women


Consider the aspirin-induced reduction in major cardiovascular events according to age.



Note that aspirin reduced the risk of major cardiovascular events in elderly women (RR 0.74) and that it was similarly effective in reducing the risk of a MI or stroke.

This sub-analysis suggests that aspirin may be efficacious in elderly women, even when used at a low dose. The >65 years patient subgroup (placebo and aspirin combined) had an event rate of 7.5% over 10 years (306/4097). That control event rate of 7.5% over 10 years was >5x greater than the control event rate in the 44-54 year patient subgroup, and this makes the elderly patient subgroup's results less susceptible to chance event noise factors. However, the WHS trialists cannot confidently conclude that aspirin is definitely beneficial in elderly women who have a low/moderate risk of a future major cardiovascular event because the sample size was too small (4,097 patients) and because we still do not know whether the elderly female patient groups were balanced from the perspective of baseline prognostic variables for the entire duration of the trial. 

It is a pity that the elderly patient subgroups only represented 10% of the WHS trial's enrolled patients. If the WHS trialists only enrolled patients >65 years of age into the trial (who were much more likely to have a control event than the younger patients), and obtained similar results, then the trialists could more confidently conclude that aspirin is efficacious in the primary prevention of major cardiovascular events -- because the trial would have had a significantly higher signal/noise ratio.
 

Issue number 5:  Selective (subjective) interpretation of a trial's data.
 

I am frequently amazed by how often trialists selectively interpret their trial's raw data to support a particular position, and how much subjectivity, and not objectivity, is involved in their decisions. Theoretically, a RCT's raw data should speak for itself, and I think that trialists should limit the extent of their subjective interpretation of the raw data.

Consider two examples from the WHS trial that demonstrate this problem-issue.

Example 1:

Consider, yet again, the results for the reduction of MI events in patients aged 45-54 years.
 

 
Note that the RR value for MI reduction was 1.23 for patients between the age of 45-54 years.

I think that an objective interpretation of the RR value of 1.23 is that aspirin is associated with a slightly increased risk of cardiovascular events, but that the results should be deemed to be inconclusive because the trial had such a low signal/noise ratio that chance events could possibly explain the small increased risk. In their official NEJM report [3], the WHS trialists do not specifically comment on this RR value of 1.23, and they only mention the fact that aspirin does not reduce the risk of MI overall. It is possible that the trialists did not discuss this sub-issue because they presumed that the increased RR value of 1.23 in that subgroup of patients was most likely due to chance events. Fair enough! However, what would the trialists have stated if chance events happened to produce the same absolute degree of chance effect, but in the opposite direction. The absolute difference in MI events between the placebo and aspirin 45-54 year subgroups was 0.13% over 10 years. If chance events caused the aspirin subgroup to have 0.13% less events than the placebo subgroup (rather than the other way around), then the RR would have been ~0.8. Would the trialists have stated that it was most likely due to chance, or would they have claimed that it was due to aspirin's therapeutic effect? I wonder!

Example 2:

In the discussion section of their paper [3], the trialists interpret the trial's controversial result - that aspirin did not reduce the risk of MI, as anticipated - by performing a private random-effects meta-analysis. The trialists state-: "To address the effects of aspirin in primary prevention, we performed a random-effects meta-analysis that included current data from the Women’s Health Study, as well as data from five prior trials involving 55,580 participants with no history of heart disease. ---- In analyses stratified according to sex (Fig. 3), combined data on women from the Women’s Health Study, the Hypertension Optimal Treatment (HOT) study, and the Primary Prevention Project (and Roncaglioni MC: personal communication) indicate that aspirin therapy was associated with a significant, 19 percent reduction in the risk of stroke (relative risk, 0.81; 95 percent confidence interval, 0.69 to 0.96; P=0.01), with no reduction in the risk of myocardial infarction (relative risk, 0.99; 95 percent confidence interval, 0.83 to 1.19; P=0.95)." In other words, the WHS trialists apparently conclude that the result-data on women from the other RCTs (HOT study and PPP study) supports their position that aspirin therapy does not reduce the risk of MI in women. Unfortunately, the trialists do not make the details of their meta-analysis publically available (which is easily possible in the modern internet era when trialists frequently make additional trial data available online). Therefore, independent trial interpreters cannot personally assess the signal/noise ratio of their meta-analysis. The only meta-analysis evidence that the trialists provide is the following graph.

Graph from a section of figure 3 [3].



Do the results from the other two trials support the WHS trialists' position that aspirin does not reduce the risk of MI? I guess if one simply averages the overall results, then it does support their position. However, is that "averaging" approach scientifically valid? Note that the HOT trial demonstrated that aspirin was better, while the PPD trial showed that placebo is better. Is it scientifically acceptable to simply average the results, and then claim that the "average" value is scientifically valid? I think not! I think that one first has to judge the scientific validity of those two trials' results objectively, by first determining their signal/noise ratios, before one can decide whether it is scientifically legitimate to simply average the results. A simple "averaging" of results from multiple low signal/noise ratio RCTs does not objectively establish the scientific truth. Let me give you an example of that basic principle.

Let's presume that a trialist (trialist A) decides to test the sweetness-saltiness of a white powder by placing a sample of that white powder on the tongues of a group of enrolled subjects, and then records whether the study's subjects decide that the white powder tastes sweet, or salty, or neutral (neither significantly sweet or significantly salty). If a single large study demonstrates that, on average, the white powder has a neutral taste, then the study's results could be interpreted as scientifically proving that the white substance has a neutral taste. However, it is obviously possible to imagine that noise could be distorting the study's results; noise due to the fact that many of the trial's subjects cannot readily distinguish between slight degrees of sweetness and slight degrees of saltiness. Noise of that type could reduce the scientific validity of the trial's results if many of the subjects had an impaired ability to differentiate between slight sweetness and slight saltiness. Then the trial would have a large noise level and a low signal/noise ratio. If trial A had a low signal/noise ratio, then it could not possibly produce scientifically conclusive results.

Would the "scientific objectivity" of trialist A's results (that the white powder has a neutral taste) be enhanced if he discovers that trialist B noted that his study's subjects decided, on averge, that the white powder has a slightly sweet taste, and trialist C noted that his subjects decided, on average, that the white powder had a slightly salty taste? I don't think that it rational to simply average the results from the three studies using some statistical technique -- especially if trial B had a low signal/noise ratio because many of the subjects had more sweet-taste receptors than average and trial C had a low signal/noise ratio because many of the subjects had more salt-taste receptors than average. I think that the scientific legitimacy of each trial, and the scientific legitimacy of the average result of the combined meta-analysis, is critically dependent on demonstrating that each white powder taste trial has a high signal/noise ratio (that significant noise due to sweet-taste and/or salt-taste receptor bias had been excluded).


Conclusion
:
   

I think that the WHS trialists made a major mistake when they enrolled so many low risk patients (who only had a control event rate of ~1.3% over 10 years), and used such a low dose of aspirin -- because those two design choices significantly decreased the WHS trial's potential signal/noise ratio. A low signal/noise ratio RCT cannot produce scientifically conclusive results (results associated with a high confidence level), and the WHS trial's results therefore do not conclusively answer the question as to whether aspirin could be efficacious as a primary prevention strategy in women who have a significantly high risk of a future major cardiovascular event.

If the WHS trial, by chance, had a positive result in women 45-54 years of age, and demonstrated that aspirin was moderately efficacious in middle-aged women (resulting in a RR <0.8), then many middle-aged women may have decided to take aspirin for the primary prevention of major cardiovascular events (MI and stroke) -- even though the trial's signal/noise ratio would have been too low for the WHS trial's results to be confidently regarded as being conclusive. Would it have then been rational for a clinician to recommend regular low-dose aspirin therapy for middle-aged women, who have a 10-year risk of a MI/stroke event of 1.3%, if regular aspirin therapy would only change their 10-year risk of not having a future MI/stroke from ~98.7% to ~99% --  if the clinician also knew that the EBM evidence supporting the decision was an inconclusive clinical research study (low signal/noise rato RCT)?

In the heartwire news report [6],  Dr Cindy Grimes (William Beaumont Hospital, Royal Oaks, MI) is reported as saying "I am not at all surprised by these findings. This was an extremely low-risk population (no prior cardiovascular history whatsoever, 90% of women were below aged 65, and the estimated 10-year cardiovascular risk was only 4%). In fact, these patients had a less than 1% risk of having an MI over the 10-year period - what therapy can improve on that? -------- Furthermore, no one expects cardiovascular disease in women until they are 10 years postmenopausal, so what's the point of this study? It definitely enrolled patients who were too young and very unlikely to have a cardiovascular event. This is the biggest reason that aspirin was not effective in young women."

Heartwire also reported lead author of the WHS, Dr Paul Ridker, countering that "there were, in fact, a substantial number of events in women in the 45-to-65 age group, so they were not "too young" to show an effect. --- There were a total of 693 cardiovascular events among women under age 65 and thus the study was well powered to show a difference between aspirin and placebo if in fact one was present. However, in this age group, there were 346 events among those on aspirin and 347 among those on placebo, no difference whatsoever."

Do you think that Dr Ridker's answer is meaningful? Does it give a clinician any idea as to whether the WHS trial had a sufficiently high signal/noise ratio to produce scientifically conclusive results?

My personal opinion is that the WHS trial has too low a signal/noise ratio to confidently assess whether aspirin has a beneficial/harmful effect in the primary prevention of cardiovascular events in women. 

In conclusion, I think that all trialists should be ethically obliged to estimate the signal/noise ratio of their RCT before, and after, they perform their trial, and they should not seek publication of low signal/noise ratio RCTs. I also think that medical journal editors should not publish the results of any RCT that has a low signal/noise ratio, because a low signal/noise RCT cannot confidently advance the state of scientific knowledge.


Jeff Mann, MD.

Retired physician.

First draft: April 2005.

jmannemg@earthlink.net


References
:


1. Mann J.
Questioning the scientific validity of the randomised trials of COX-2 inhibitors showing an increased risk of adverse cardiovascular events.

Available at http://jeffmann.net/soapbox/vioxx-cox2critique.htm

2. Mann J. A personal analysis of the NINDS study using patient-level data.

Available at http://jeffmann.net/soapbox/NINDSpersonalanalysis.html

3. Ridker PM, Cook NR, Lee IM, et al. A randomized trial of low-dose aspirin in the primary prevention of cardiovascular disease in women. N Eng J Med 2005; 352: 1293-1304

4. Bresalier RS, Sandler RS, Quan H, Bolognese JA, Oxenius B, Horgan K, Lines C, Riddell R, Morton D, Lanas A, Konstam MA, Baron JA. Cardiovascular Events Associated with Rofecoxib in a Colorectal Adenoma Chemoprevention Trial. NEJM March 17th 2005. Vol 352. p1092-1102.

5. Sackett, David L. Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!) CMAJ 165(9):1226-1237, October 30, 2001.

Available online at http://www.cmaj.ca/cgi/content/full/165/9/1226

6. Lisa Nainggolan. Women's Health Study formally published: Discussion continues over results. Heartwire report at theheart.org. March 30th 2005.