Randomized trial versus observational study challenge

By revere on January 2, 2010.

I've noticed that whenever I have the temerity to suggest (e.g., here and here) that maybe the word of the Cochrane Collaboration isn't quite the "last word" on the subject and indeed might be seriously flowed, I hear from commenters and see on other sites quelle horreur reactions and implications this blogger doesn't believe in the scientific method. Why? Because "everyone knows" that a randomized controlled trial (RCT) automatically beats out any other kind of medical evidence and any Cochrane review that systematically summarizes extant RCTs on a subject like flu vaccines is therefore a highly reliable source of evidence. In the spirit of full disclosure, I am an academic epidemiologist who has made a living out of doing observational studies, i.e., studies where I don't get to assign the independent variable, much less randomize. I can only "observe" the outcomes (the dependent variable) and infer causal effects from my observations. That might explain why I am over sensitive to the prevalent worship of meta-analyses of RCTs as practiced by the Cochrane collaborators. Or maybe I am just blinded (or even double-blinded) by what I do and my ignorance of scientific method.

So I'd like to try something with my readership. It's not an "experiment" in the formal sense, just tossing something out there as "shark bait" to see what happens. I'm going to describe a fictitious observational study and I'd like you, the readers, to tell me why it isn't something that should form a reliable basis for taking some action (e.g., treating someone). In a later post I'll respond or at least tell you the point of the example (which, while fictitious, represents actual practice in some quarters of clinical medicine).

Here's the set-up:

On the basis of some good data from animals and pharmacologic studies I believe an existing anti-epilepsy drug will be effective for patients with refractory hypertension, i.e., people with high blood pressure who are not successfully treated with existing therapies. This is an FDA approved and drug that is considered acceptably safe for epilepsy, but blood pressure treatment is an "off label" use (I believe this would be both legal and ethical, but since we are discussing scientific method it is not germane at this point). There is no interest in this from the drug companies because this is already a generic drug, so there isn't enough money to be made on it and there are no funds for an expensive RCT involving thousands of subjects. So I decide I will try it on patients in my cardiology practice with refractory hypertension.

I have carefully researched the alleged biochemical mechanism ahead of time, cross-checked it against lots of clinical data, and decided, on the basis of the science, exactly which patients with refractory hypertension should be the ones for whom this therapy will work. I write the criteria for treatment down and give them to one of my medical residents so she can go through my practice's current patient histories and monitor new patients as they enroll in the practice, selecting ones with refractory hypertension and of those, deciding which ones will get the drug on the basis of my written criteria. Over the space of two years she finds 52 candidates and decides 29 fit the criteria. I don't pay her but promise she can have her name on any paper we publish. While not every patient is a perfect fit, all meet a rough minimum standard.

Each of the 29 selected patients is given the drug at their next office visit and told to take one pill twice a day until the next visit, usually about a month. No one gets a placebo and of course that means we know which ones got the drug. They all did. The patients also know they got the drug. So there is no blinding. On the next office visit we measure their blood pressure again. Both the before and after blood pressure measurements are made with a digital reading cuff that requires no intervention from the clinical assistant besides putting the cuff on the arm and recording the results in the chart.

The difference in blood pressures is our quantitative measure of the effect of the drug. If the drug had no effect there shouldn't be any difference. We decided ahead of time to test the difference for "statistical significance" with a linear model and the F-statistic (null hypothesis is the difference in blood pressures was zero). Using these methods there is an apparent positive effect of the drug of about 10 mm mercury systolic pressure and 15 mm mercury diastolic pressure which, because the F-statistic is too big, we do not believe is likely a chance effect. Our study concludes this anti-epilepsy drug can be used for cases of refractory hypertension. We submit the results for publication to peer reviewed journal, it is accepted and becomes part of the medical literature.

This is obviously not a randomized controlled trial because it is neither randomized (everyone got the drug) nor controlled (everyone got the drug). But it does have some strengths and it is these strengths that got it past peer review. First, treatment is well-defined and started at a precisely determined time. This is also true of properly designed RCTs. There is a well defined comparison group: the same person before the drug use. Thus the comparison group is well matched to the treated group (same sex, race, essentially the same age, educational level, income, etc.). This makes the balance of important covariates better than most RCTs. What about measurement of outcomes? Well, that's a bit more of a problem, perhaps, because we don't know if the effects on blood pressure we see have any relationship to clinical outcome. We have chosen a clinical measure we think we can make reliably, blood pressure measurement with a digital cuff. It is a commonly used one, so we'll just go with it. We don't have the ability to follow-up all our patients for years to see if their mortality is improved. We do have exclusion criteria (or if you prefer, inclusion criteria) decided upon in advance on the basis of the scientific literature. We have trusted our medical resident to apply it properly. Since the trial lasts only a few weeks and the subjects are all long term patients in our practice, we don't have the problem that many RCTs have of loss to follow up or switching treatment groups but we might have a non-compliance problem. But so do RCTs. Our patients are probably not a representative sample of the population of refractory hypertensives. They are a convenience sample. But they are humans and have refractory hypertension and we don't know of any factor that specifically makes our patients biologically unlike other people that might have refractory hypertension with respect to response to this anti-epilepsy drug. We analyzed the data with an accepted and appropriate measure, decided upon ahead of time.

So here's your challenge. This is not an RCT and likely wouldn't be included in a Cochrane review of the use of this anti-epilepsy drug for refractory hypertension (let's assume this is true; it is highly plausible). That suggests it is either not a reliable source of information or else that Cochrane reviews are excluding reliable sources of information.

Question: In your judgment, is this a sufficiently reliable study that a reasonable practitioner, committed to the use of scientific evidence in her practice, would consider?

I'll give it a few days or a week to get a response from the hivemind. Of course any EBM site is free to repost and/or circulate. I'm curious what the reaction will be.

More like this

Randomized trial versus observational study challenge, III: metaphysics

Let me start with an apology. This post is again fairly long (for a blog post). Blog readers don't like long posts (at least I don't). But once I started writing about this I was unable to stop at some intermediary point, although I might have made it more concise and less conversational. I haven't…

Randomized trial versus observational study challenge, VII: randomization, second part

[Previous installments: here, here, here, here, here, here] Last installment was the first examination of what "randomized" means in a randomized controlled trial (RCT). We finish up here by calling attention to what randomization does and doesn't do and under what circumstances. The notion of…

Is evidence-based medicine "sufficient" for alternative medicine research?

One of the consistent themes of this blog since the very beginning has been that alternative medicine treatments, before being accepted, should be subject to the same standards of evidence as "conventional" medical therapies. When advocates of evidence-based medicine (EBM) like myself say this, we…

Randomized trial versus observational study challenge, V: pre-randomization

[Previous installments: here, here, here, here] We'd like to continue this series on randomized versus observational studies by discussing randomization, but upon reviewing comments and our previous post we decided to come at it from a slightly different direction. So we want to circle back and…

I'm a statistician, not an epidemiologist, so I've only learned about the placebo effect. But how come that isn't considered? I'm sure textbooks describing BACI designs start out by explaining why this design is bad.

I can imagine using a multi-level model, where the treatment effect is compared to the distribution of placebo effects, over other similar studies. It's not ideal, and buggers up the power of the study (and almost forces one to be Bayesian about this - oh the horrors!), but I've done worse in my own work.

My science research background is from a very different field and a somewhat different time, but I thoroughly enjoy the challenge of this post.

I would think that there are concerns with this study, including the potential placebo effect (although I understand that recent research has diminished the concern over it) mentioned previously. Also, while the statistics are useful here in establishing the significance of the results, it does seem difficult to rule out other sources of the change in symptoms since none were measured here.

Just the $0.02 of a marine scientist-turned-teacher.

I agree it needs a placebo control group, or short of that, it needs to be repeated in multiple locations with other sample groups (using same criteria), especially to call it "reliable." The single study with a relatively small sample size is problematic, even though intriguing. (as with any clinical study a lot of assumptions and untested variables are unaddressed here).

Shark bait indeed! Second year epi students will have a blast tearing this apart. But really, if this is a serious inquiry, the study can probably be considered a case series, which is a low form of scientific evidence (case report being the lowest). It does not rise to the level of scientific evidence sufficient to change the standard of care.

So far, what you have described represents a phase II study on a new anti-hypertensive drug and is clearly well conducted. The point seems to be the ability to show that the drug has had a positive effect on blood pressure which the drug clearly had. Since it is a short-term study, we cannot address clinically-relevant issues like morbity or mortality, but this was not the aim for the study and you made it clear in your explanation. I also could notice that people are concerned about the placebo effect. My answer to this is that you obtained a drop of 15mm Hg in diastolic pressure and 10mm Hg on systolic, treating a condition which by definition is refractory to treatment. Well, with this results I might ask you the name of this "placebo" and might as well start studying it ;)

I am a physician but not a cardiologist. A minor point, but I was taught that hypertension was best assessed by measuring BP at 3 different time points. I think adding 2 additional measures of BP at 2 future time points would add to the reliability of the data. How can one ethically have a placebo control when the potential negative outcomes (stroke and heart attack) are so high if the patient's bp is not controlled. A more ethical clinical study might involve comparing it to a standard alternative that is FDA approved?

It has been a very long time since I have done any research so forgive me if I am sounding ignorant on study design. I am thinking like a clinician...I hope :)

Would also agree w/ Cassio:

Per Wikipedia (Search: Clinical Trial):
"Trial design
Some Phase II trials are designed as case series, demonstrating a drug's safety and activity in a selected group of patients. Other Phase II trials are designed as randomized clinical trials, where some patients receive the drug/device and others receive placebo/standard treatment. Randomized Phase II trials have far fewer patients than randomized Phase III trials."

I would consider it, as fairly low quality evidence. And sometimes low quality evidence is going to as good as we can get.

Whether or not it would influence me (if I treated adults with refractory hypertension) to use the medication in those meeting the criteria that the study used would depend on more than just the quality of the evidence. What are the possible risks (costs) of the treatment? What is the possible long term benefit? If I was convinced that the possible benefit of treatment with the medication was very important and that the costs/risks of adverse effect from medication were small, then poor quality evidence might be enough to convince me to consider the medication's use.

I think the US Preventative Task Force would call this level II-3 and that the National Health Service system would score it as a level C. I'd like better evidence than this before deciding to treat or to not treat but we have to decide using the evidence we got, not the evidence we wish we had. (Yes, I know who that sounds like.)

As a clinician (psychiatrist), I'd suggest one little modification to the study you suggest. Since it is very likely that not all of your patients will actually take the new medication, I'd have your resident ask each one, "We know how hard it is to take new medication regularly. Since this is a scientific study, could you tell me whether you were able to actually take the new medication for every dose? If not, how many doses did you miss?" Before the study begins, you will have decided (arbitrarily, to be sure) how many doses could be missed for that patient's results to be excluded from the study. This might improve the chances of detecting an effect.

I'm also a physician and epidemiologist. You're evil! This is going to create lots of discussion! I think I see what you're getting at though. Yes, it's a case series, but does that mean it's automatically not reliable evidence? Of course not. But I agree with your readers that there are potential problems with placebo effects - on the other hand, if you have a consistent effect measured objectively in almost every patient, could that be a placebo effect? Getting back to your point about meta-analysis, you're right of course. Yes they can ignore good quality observational studies. And the interesting thing is that even though they summarize RCTs, the meta-analysis itself is observational - what does that mean? That brings up the question of whether there is any role for meta-analyses of observational studies? They exist. Are they reliable? Are they valuable? Anyway, fun post. Keep stirring the pot.

Revere- I believe you are getting at a crucial point that is often overlooked or ignored in medicine. Our ability, as a discipline, to aggregate information at every level (from meta- to micro-) and push that information out to the "foot soldiers in the trenches" is embarassingly poor.

Cochrane reviews are our "best" aggregator of information. They are based on the principle of avoiding type-I errors in selecting studies to include (i.e. never let a "bad" study into a cochrane review). Is this a good system? Maybe. Is it the best idea? Maybe. Depends on what you want.

Perhaps instead of "stirring the pot" or providing "shark bait", you could spend time testing an alternative? Or maybe one already exists that you would like to advocate?

There are other, more egregious, examples of our faliure to actually provide useful information from journal articles. Has anyone tried to use pubmed recently? Has anyone tried to use the google recently?

Medicine badly needs a better way to deliver information to "front line" providers at no cost to them. Perhaps there is one buried somewhere in the Senate's health care bill amidst pork for Montana and Nebraska, Cosmetic Surgery practices and Longshoremen? That would be too much to hope for ...

Of course in this hypothetical I'd wonder why the study was so needlessly crappy. There is no reason that a few more patients meeting criteria could not have been recruited and randomized into treatment and control arms and that treatment and control could not have been double-blinded. Or even with the same number recruited using a cross-over control group - again double blinded and randomized. Either would have made for much better quality of evidence, moving this from a level a hair's breadth above a group of anecdotes into decent evidence. The fact that it wasn't done when it could have been done fairly easily, and that a single resident was responsible for recruitment and decided who met the criteria well enough, and a single attending was involved, would raise all kinds of alarms.

The problem is that you have to run this by an IRB. If it is âresearchâ it has to go past an IRB, and you canât bill the patientsâ insurance companies for either the face time, or the meds. You need informed consent from all of them.

If you just decided that the meds were appropriate and just decided to give them to patients without systematically recording the results, then it is not âresearchâ and you donât need any approvals, and you can bill their insurance companies for both your time and the meds. Maybe you could write it up later, but if you had all this data and stuff, then you were doing research and without IRB approval, that is defined to be misconduct of a pretty serious order.

I think it would work. There is a gigantic amount of cross-talk between neuronal activity and vascular activity, mediated by nitric oxide.

daedalus2u: If I am a cardiologist in practice, not receiving federal funds for this work, I don't have to run it by an IRB (at least that's my understanding). This is "off label" use for an approved drug and I am just being a careful practitioner in how I do it. However, I want to keep the IRB and ethical issues out of this discussion, for the moment at least, to concentrate on the research methods issues. So for that purpose, assume it was approved by an IRB. I suspect it would pass many of them (not mine, alas).

Don S: I think you are evading the question. You just are assuming the study is crappy. Tell me what is crappy about it. Do you think the information it provides is misleading? If so how? The numbers here are irrelevant because I stated up front it was sufficiently powered (the F test results). Let's just say the study was ended when the medical resident went off to do her dermatology fellowship! Nothing in the example hangs on the numbers, whether you think they are big or small. Concentrate on the scientific method and tell me why this is worse than an RCT and why you think it needs to be blinded. Note that in some ways it is better than RCT because covariate comparability is much better. Where do we lose here? I'm not saying this is a good or bad study. I'm asking you to say why it is or isn't and whether you would rely on it.

Sounds like a good start to provide data to design a good definitive study (but who would have incentive to support such the study).

The cross over of Guanfacine (old generic BP med) had some small studies done in ADHD and was adopted by some phsychiatrists as off-label 3rd tier med for this use. It was used to supplement a stimulant or alone. Although the half life of this med was pretty long, the effects for ADHD or oppositional behaviours was only 4-6 hours requiring multiple doses for full day coverage. Shire formulated it into a long lasting form (IP protection) and performed the phase III clinical trials to market the med for the indication of ADHD in kids. The generic is still there for off-label use, but the once-a-day dosing and FDA market approval for the indication of ADHD helps with insurance coverage and has some market advantage.

Nice. As a clinician, I might consider this drug in a patient with refractory hypertension who is without insurance or without other treatment options, or else an epileptic with hypertension if I am adjusting meds. As an epidemiologist, my biggest problem with this study would be the lack of corroborating evidence despite the drug being around long enough to have become generic for the treatment of epilepsy. I would suggest companion studies to give this study some clinical context to estimate the durability and magnitude of the effect. For example, a retrospective case control study comparing rates of this drug among epilepsy patients with refractory (cases) and controlled (controls) hypertension. Or a retrospective cohort study comparing rates of refractory hypertension among epileptics given this drug vs. not. In addition, if similar multiple observational trials such as you describe are done, then a meta-analysis can be performed. This study alone will have little impact on the treatment of most patients.

I'm not a scientist, just a blackjack dealer, so this might just be comic relief, but...what if everybody who walked into your office had a drop in blood pressure between one month and the next? What if the first month the stock market was wonderful and the next month it crashed? What if the first month your parking lot was under construction or there was a snow storm or heat wave or ozone alert? In the second sampling did you cut down the amount of time people sat in your waiting room or send the grouchy receptionist on vacation or even set the thermostat to a more comfortable level?

It just seems like you'd have to somehow measure the factors that are not in your study instead of automatically taking credit for the improvement.

I agree with ARJ-- the fact that all of these were conducted at your office at the same time is particularly problematic. This could have been a change in your practice, or in something else on a local scale (I've heard of studies whose results have been altered dramatically because a doughnut shop opened a block away from the treatment facility). You need a control group not only because of the placebo effect, but because there are time-varying confounders that have not been considered. The small sample size doesn't help either. The results are suggestive, but cannot be considered conclusive evidence.

My primary concern with the design of the experiment is the
application of the drug exclusively to new patients.
I'm always more nervous around practitioners who are new to me,
and grow more relaxed with them over time. This could have an
effect on my blood pressure as measured in the office all by
itself.

I answered your question very directly: I would consider it but as fairly low quality evidence and that it could have been designed to have given much better quality evidence by using double-blinded controls. A cross-over case control study with a wash out period would have been simple to do with the same number of patients. So your question is now why a study with controls and double blinded is better than one without? Specifically what were the problems introduced into this dataset that could have been avoided by doing so?

Oh let me count the ways!

Other things may have changed in the environment over time. Perhaps between nontreatment and treatment times an environmental pollutant or toxin was eliminated by the town, or a war ended, or a new factory opened up and they suddenly had jobs and had stress from unemployment reduced, or construction at the hospital causing parking problems and loud noises during exam visits stopped, or who knows what else.

The resident could have selected for patients who she had more of a personal relationship with and as she wanted positive results to get an authorship also spent more than a usual amount of time and care with them. This is not even a placebo effect; it is a possible effect of personal care and attention and relationship on blood pressure. It is another variable that was avoidable with blinded controls. The attending also could have spent more energy reviewing standard care. These may be patients who were previously "refractory" because they were noncompliant with diet. exercise, or even taking their meds, and who now, because of the obvious interest being given to them became more compliant with the standard aspects of their care.

Perhaps natural history of "treatment resistant hypertension" as identified by this researcher is to peak there and then to improve some over time.

More attention could have been paid to proper cuff sizing and placement once they were study participants.

Then there are the standard placebo effect problems. And on and on.

There is reason to consider lower quality evidence in certain cases, and sometimes even to make a decision based on it (as I have already expressed). To categorically ignore evidence because it is not from a RCT is a mistake. But to ignore the fact that some sorts of evidence are higher quality than others is also a mistake.

susan, PJG: Just to clarify. I didn't mean to imply (nor do I think I said) that all the measurements were done at the same time. In fact I imply they were done over the space of two years between regular office visits of refractory hypertensives already in the practice and new ones that came in during that period. Enrollment was by the medical resident, using criteria supplied to her in advance. This means that on the same day one person might be having the drug follow up measurement while someone else was having a baseline measurement. And wouldn't you have the same problems with an RCT if they were all enrolled at once, randomized and then treated (e.g., maybe there is a powerful external factor that is masking any effect of the drug)?

But I don't really want to analyze these responses as we go along. I'd prefer to do it all at once in a week or so when this has had a chance to percolate through the blogosphere. But I did want to keep the example constructed in a way that keeps our eye on the ball: the difference between this one arm non-blinded non-randomized non-controlled study and what some people believe is not only the gold standard but the only standard, the RCT. This was instigated by the reaction to our critique of Jefferson's flu vaccine meta-analysis in the BMJ (and subsequent Atlantic article) which caused a great deal of confusion. It is an effort to have a collective "thinking through" of the issues involved rather than just reflex reactions to labels for study designs.

What you have here, basically, is a multiple-subject single case design. You could control for placebo effects simply by (randomly, randomly-matching or providing to all subjects) a similar trial with a real placebo. A true A-B-A study (with the drug being A and placebo B) would pretty much do it.

Don S.: Thank you. That is exactly the kind of response I was hoping for (and that I anticipated). Keep it coming.

revere,

Are you familiar with the systems that exist to score the relative value of evidence (that I have alluded to), such as that of the US Preventative Task Force, or the National Health Service, or the GRADE system?

The concept is not as Cochrane seems to rigidly apply it, that only RCTs need apply, but that there is a hierarchy of evidence quality. Your hypothetical researcher went for the "easily obtained" and suggests that it not only is true for his particular patient population but for all other groups as well. This is a huge temptation as we go to EMR and can retrospectively data mine with greater ease. Believing that something is true every where because of what you've seen under the streetlight (where the light is better) is a real issue.

I assume consider means considering a change in a patient's care. Changing a patient's care based on one non-RCT study seems like a ridiculous proposition unless there is reason to believe a dramatic positive outcome could occur.

Changing a patient's care based on an understanding of a body of evidence on the subject is more reasonable, so if this adds to other evidence to create a stronger hypothesis for a treatment, then it might be a reasonable proposition. That understanding should be able to explain how the non-RCT evidence complements any existing related RCT evidence. If the hypothesis chooses to ignore RCT evidence, then it isn't one that should be used on a patient!

The *hypothesized* benefits must be weighed against the possible negative effects. The potential negative outcomes are just as important as the positive outcomes, and I don't see a discussion of them for this study other then stating the study doesn't have the resources to follow-up on mortality rates.

Not a scientist, just an advocate of science and one that enjoys dabbling in it...

I had a stats class in grad school about 3-4 years ago at a state university. What we learned, which seems to make sense to me, is that with all the different types of studies that exist, it's important to pick the one that will work best for the study. We learned there was really not a "better" type, but a type that might be better suited for the information you are trying to ascertain. Not to mention there may be resource constraints to doing one study versus another.

So for instance, if you wanted to find out how stressful construction work is for project managers (PMs), you might send out surveys to different PMs of construction companies and pick the ones that are fortune 250 or greater (lower) companies. Is the information you collected causal and accurate, maybe not, but you can still get valid data that you can use to make statements about the stressfulness of PMs in construction and the companies they work for. You might say, "Yes, but some companies may be more stressful than others." True and there may be even more variables you could point out too. But these are things you can attempt to account for and include questions about in your survey. Also in your paper of this study you could look up research already conducted on the stressfulness of these jobs and use that to further support or refute points.

I think what I'm saying is you need to design studies that fit your situation and accomplish what you need while working within your resources and constraints. While RCTs may seem better, it would likely be impractical to design one for my PM example above. But does that make the data from the PM study invaluable or worthless? Should I not conduct my PM study and wait till its feasible to run an RCT?

I think in the hypothetical above the study seems to fit the situation appropriately. If it's feasible for someone else to conduct an RCT to look at the same information as the hypothetical above that would be help add validity. But I don't think we should throw away the hypothetical because it isn't an RCT. Studies should be looked at per situation not what place they land on a "study hierarchical" chart.

I'm a mere medical student, so I apologize now if I'm mischaracterizing anything here.

This is a study that a reasonable practitioner committed to the use of SBM/EBM could consider. That being said, a reasonable practitioner committed to the use of SBM/EBM would be in the wrong to dictate treatment plans for a significant number of patients based on 1 article of any kind.

My understanding of SBM/EBM is that we don't use one article, we use the balance of evidence. Ioannidis wrote a nice essay a while back on how alot of published research findings are false. There are a lot of different factors that come in to play, and while the study you describe is a valid study that I would consider, to me it doesn't constitute a preponderance of evidence by itself.

My understanding of the Cochrane reviews is that by limiting themselves to RCTs, they may miss out on some valid therapies that we may lack resources to test using larger studies, but by focusing on larger studies, and studies that reduce alot of biases they reduce the possibility of suggesting therapies that "lucked into" a significant F statistic or P value here and there.

I am more than willing to admit that this means they are not the be-all end-all in an academic discussion, and that they miss some valid studies, but they are a relatively reliable, relatively quickly accessible resource for the physician who lacks the time, or comfort level to survey the literature thoroughly themselves.

Let's put another spin on this. Let's say one of those patients with refractory hypertension asked you, their hypothetical cardiologist, to use homeopathy to treat their hypertension. Let's say the cardiologist uses this exact same method to test the homeopathic treatment. There is a small, but real chance that this kind of a study, a relatively small case series, would give a false positive. That would appear to be the kind of error that Cochrane avoids by including only RCTs, and/or larger studies.

So: Yes, I'd consider it. But without other studies to give me the impression that this is relatively reproducible, real effect, it wouldn't change my practice.

I'm not a scientist or a doctor though I follow these issues with a great deal of interest. I also write with a personal experience of having been personally pretty severely hurt by two drugs which had passed the most rigorous controlled studies while under the care of a physician with the very highest reputation in our community, and having been greatly helped by treatments that are supported only by quite weak observational evidence, but do make a great deal of sense from the theoretical perspective. So, I am glad to hear a group of very serious doctors and researchers most of whom appear to accept the notion that not all treatments or drugs can actually be rigorously tested as a practical matter. What I don't hear is much skepticism about the rigorous testing that is being compared to, which itself is far from perfect -- it tests well for whatever the observer is interested in, but as demonstrated by repeated disasters involving tested drugs post FDA approval, the validity of the test is limited by the focus of the experimenter and the design of the experiment. Maybe this is outside the scope of the question, but you do seem to be asking whether evidence such as the case studies you describe is adequate to rely on for care, and my point is the alternatives are also not all that reliable in many circumstances, even if advocates of "scientific" medicine always assume a rigorous clinical study is the touchstone of truth.

To add something concrete to this: one of the harmful therapies given to me was a long-approved medication for blood pressure, which I'm sure has helped many people, but has a rather high frequency of side effects. My doctor was so committed to this med I had to leave his care to get a different med. But, in addition, I wound up in psychotherapy due to the side effects of the first medicine (which did not reduce my BP). I was still taking the first medication, but due to what I would call a breakthrough in the psychotherapy, my BP (which I was taking twice a day at that point) suddenly dropped by approximately 20 mg and stayed down) A different med later helped even more. I was extremely surprised, but then came across a book called Healing Hypertension by Dr. Samuel Mann, which proposes that suppressed emotions play a large role in what is known as refractory hypertension. Dr. Mann's book resembles a longer version of your study in some ways, but he also makes the very interesting point that his hypothesis would be extremely difficult to test by conventional rigorous approaches, due to the difficulty in telling who has suppressed emotions until after therapy and the inherent vagueness of the whole idea of suppressed emotions. But just because something is vague and difficult to test, doesn't mean it isn't significant, as many of your commenters above clearly seem to recognize. Moreover, unfortunately most of the large clinical studies on medications are done by drug companies whose primary motivation (sometimes, even one might say, sole motivation) is profit, leading to all kinds of distortions in the experimental design and interpretation of results. In academia, motives tend to be a little purer but there are issues of career advancement, fame, and the like. In the real world, one has to consider the issue of milking the statistics for significance, for example, when comparing rigorous controlled studies as actually performed with the hypothetical, and this might make the hierarchy of evidence mentioned above somewhat less reliable.

I hope this isn't too far off topic. This is a great post and a great set of comments if you ask me.

Let the Bayesian in me take a swing at this pitch. We Bayesians believe that beliefs are personal, and that different people might reach different conclusions with the same information. The only requirement is that new information be combined with old in a coherent way, that is, via Bayes theorem.

So, how do I feel about this treatment after I have reviewed the "study?" Well, if I am the one who did the study, I suspect my posterior probability that the treatment would be useful in future patients in my practice would be quite high. If my experience is different, I would probably reach a different level of belief. I frankly can't imagine myself dismissing the study out of hand (even if you told me that you, the investigator, were a paid lecturer for the company that makes it!!).

Another key question, I believe, is how should future patients feel about this treatment? And who is going to help them with their judgments?

I echo Don S' comment that a crossover design would have been superior. It is possible that through some kind of Hawthorne effect, the patients engaged in other healthy behaviours during the treatment period because they knew they were being assessed as part of a study. A blinded, crossover design would help to distinguish any placebo effects from the effects of the drug itself.

This is not to say that the study is worthless. Ideally, this study would prompt other researchers to try and replicate the findings either through their own case-series, or through an RCT. I think the value of such "preliminary" studies in clinical practice also depends on the context i.e. the existence of alterative treatments, the severity of the disease, the risks involved, etc. It really depends on the clinical question that is most relevant to the patient and the practitioner.

In response to WcT, revere has already suggested that there is adequate evidence to support biological plausibility - therefore, the comparison to homeopathy is irrelevant.

I should add that because of the concerns over a possible placebo effect, it might be worth comparing the results of this trial to those in the placebo arms of any RCTs in similar populations with refractory hypertension. I realize that such cross-study comparisons can be problematic, but at least it would at least help to give some context to the outcome.

@NP

In response to WcT, revere has already suggested that there is adequate evidence to support biological plausibility - therefore, the comparison to homeopathy is irrelevant.

Fine, make it St John's Wort for depression, or some other plausible biologically, but as yet, without an actual body of other studies indicating it's usefulness for this application.

The point was a previously unproven therapy without a body of other evidence to suggest it's viability. We're talking about an intervention with an, unclear, or relatively low prior probability.

So we're talking about a study of a small number of patients, without good controls, of a drug without a larger body of literature supporting it's use as an antihypertensive. That makes the study interesting, but not a practice changer.

I agree with you that a placebo arm, or a cross over study would be useful

When that's the case, it requires more than a single, observational of relatively small study to change practice.

I think the basic issue is that you can't be certain that the drop in blood pressure is due to the drug alone, since it isn't the only thing that changes between the baseline and endpoint. The most important change, I would think, is that many of the patients will be with a new physician (the investigator). Why is this a problem? Well, it's possible that you're just an exceptionally gifted physician who is able to reduce new patients' blood pressure even without the drug. It's possible that there was a change in your practice that caused your treatments to become effective for existing (and new) patients, or maybe you just tried harder because you wanted to see a positive outcome for your trial. It's also possible that something caused your new patients to seek a new doctor--maybe they never got along with the old one (maybe that's why treatment never worked and that's why they acquired their diagnosis), or maybe they recently moved from Texas to Japan. That's a lot of maybes, and that's the point--we don't know what factors besides the presence or absence of drug also changed during the treatment period. Based on the research presented, I cannot reasonably conclude that the drug is the cause of their decreased blood pressure.

The reason that RCTs are so popular for pharmaceutical research is because the RCT design allows you to compare two groups whose only difference (in theory) is the presence or absence of a drug. If there is a significant difference in blood pressure between the groups, you can be fairly certain that it is due to the drug alone. We can make that conclusion because, even though we don't know what other factors change, we can assume that the factors are more-or-less equally distributed between the two groups. There are, for example, an equal number of patients in both groups who hate their doctor, or recently moved, or finally threw out the salt shaker (for real this time, I swear). Because these other factors are randomly distributed between the groups, they will affect both groups equally, and we can conclude that any change in the aggregate outcome measurement is not caused by those factors. That leaves the drug as the only reasonable explanation for the difference.

Unfortunately, as you point out, pharmaceutical companies aren't willing to invest the time and money to run an RCT if they can't make bucketloads of money from it. I suppose that the hope is that the observational study described above creates interest among other practitioners and researchers who go on to run their own little trials, and that enough evidence accumulates that either we can accept that the drug lowers blood pressure in the patients meeting the criteria or a large funding body decides it's worth the investment to run an RCT. However, I don't believe that this study alone should be enough to change the standard of care.

Blood pressure is a slut of a variable.

I love this post. See, the thing is, I have asked this straight out to medical people, as a layperson, and been met with ridicule. And the eyebrow and almost imperceptible head shake. How can a Cochrane review REALLY measure/quantify/make certain of an outcome if the studies are different in bunches of ways? And how is it NOT possible that certain studies with certain outcomes are not omitted for certain reasons- the biggest reason being that they do not fit the reason or expected outcome someone with a bias might have?? I am then patted on the head as people hope that I just go away and do as I am told. I don't. I read instead. And that ticks them off too. I would have to read a tonne to respond to your scenario, so I won't embarass myself, but I am glad that there are some medical people out there that ask questions about this stuff. It would be nice to meet some in real life.

BTW- those damned epilepsy drugs used for other reasons REALLY piss me off. I will not bore you with the details of why, but every time I see a company in trouble because of them, it makes me smile just a little bit. Maybe your scenario is made up, but honest to god, it would not surprise me if that is the next big thing...meh.

Oh yes, gmm, I hear you. Not only in the medical world, of course, but among any set of "experts" with power (and, in some but only some subsets, expertise). In this regard, in fact, Reveres, I have copied this post and comments for showing to the county commissioners planning a nearby wind-turbine site, as they continue to shrug off as "unscientific" several surveys (in various countries) of residents oppressed by wind-turbine noise--as if such surveys, which were not even recording percentages but simply the existence of such reactions, had to be (in some weird sense) "scientific experiments," let alone RCTs. Let them see the real complexities of what they are gabbling about.

All great thoughts and comments. The scenario here as well as the vitamin(s)/HRT/and the-latest-name your x, name your Y-association study should serve to further remind us of the potential pitfalls of using observational studies to infer causality -> inform decision-making. Medical scientists continually walk a political tightrope made of strands of skepticism, hope, methodology limitations, rigorous analysis, consensus, ethics, interpretation, and limits of current body of knowledge. Hind sight is usually 20/20.

I think the main issue here is not so much the trial, as the characterisation of Cochrane reviews. They do not need to cover only RCTs; and although the majority does select RCTs, there are still a number of Cochrane reviews published that include other types of trials. An example is DOI: 10.1002/14651858.CD007424.pub2, which covers a setup that resembles the one in your description (although analysis was very different).

Just peripherally-- several years ago I came across a Cochrane report identifying a particular study as usung a randomized design. I happened to know the study well, and it was nonrandomized. I notified the relevant Cochrane person and they did change the designation. Nevertheless, let the reader beware!

I think this trial would be enough for me to prescribe the antiepileptic drug as an antihypertensive for this population (BIANAD). There are a great many antihypertensive drugs. If a particular patient has refractory hypertension, then presumably multiple antihypertensive drugs have already failed and the cardiologist is not satisfied with all other approaches.

If multiple antihypertensive drugs have failed, then this patient has a physiology that is behaving idiosyncratically, and similar drugs in similar classes may well fail also. An effective drug likely needs to be in a new class, but there arenât any new classes of antihypertensives because there isnât enough profit in them for Big Pharma to develop them. This antiepileptic drug is in a different class, so it may work when others fail.

Since this drug has a known and good safety profile for chronic use for epilepsy, the side-effect profile is known and is acceptable for treating epilepsy. Epilepsy is more serious than hypertension (depending on the severity), but because the side effects are known, they can be monitored pretty easily. How well this particular drug works as an antihypertensive isnât known very well, but that simply means that the patient will require more monitoring to titrate the dose to the correct level.

If I were reading such a study and considering prescribing based on it, I would have to read the references too, and enough about the physiology of the drug and the side effects to understand them pretty well so as to be able to monitor them. But after that, sure I would use it.

In terms of cost; it is a trade off of more monitoring for therapeutic effects and side effects vs. having a generic drug instead of the latest and most expensive antihypertensive. The cheaper drug could easily save enough to cover increased monitoring.

Because this is something ânon-standardâ, the patient notes have to be pretty clear about that, so that if something happens to the prescribing cardiologist, that who ever takes over can provide continuity of care. In that context, presenting the case series in the literature so as to get feedback on it is the proper course of action.

I think the point I want to make is that hypertension is a chronic problem that requires chronic treatment and chronic monitoring. With chronic monitoring, whether the drug works or doesnât work for this particular patient is measured, not assumed. If it doesnât work, then after a few months the patient comes off it and something else is tried. If it does work, but there are unacceptable side effects the patient comes off of it. If it works and the side effects are acceptable, then the patient stays on it as long as it works. If it works for a few years, by then maybe there will be some new drugs that might work for this particular patient.

Interesting post. I've given my thoughts on this over at Evidence in Medicine: http://tinyurl.com/yevx7qb

David: This is a really great response. I will use it abundantly in my discussion of this challenge and there is nothing in it I disagree with. As you will see (I hope, if I can find the time) the study was designed to illustrate some slightly different points, but like a lot of things. once you start thinking hard about it all sorts of other things pop up and you have raised some of them. I will try to work them into my response(s). It may take many more response posts than I envisaged as a result.

To everyone else who responded, our great thanks. David Rind (in his post which is a must read and linked in his comment above (#40) has also read your comments and made a critical observation about them which I will pick up on (I saw the same thing but he expressed very well). Now if I just didn't have to do this grant proposal . . .

Revere: Is it possible for you to post the full text links to the article & subsequent critiques that motivated this exercise? I think it would be helpful to gain additional insight. Thank you!!

shark: Here is the full link:

http://www.evidenceinmedicine.org/2010/01/reveres-thought-experiment-an…

It will also be in tomorrow morning's post where I take this up, in what I expect will be the first in several response posts.

Sorry Revere: I meant, could you post a full text link to the meta-analysis in BMJ (perhaps the pdf?), your critique of that analysis and subsequent Atlantic article. Am I asking too much? I hope not!! At the very least, perhaps a full citation?

Re: "This was instigated by the reaction to our critique of Jefferson's flu vaccine meta-analysis in the BMJ (and subsequent Atlantic article) which caused a great deal of confusion."

btw, Thank you for an educational, thought provoking blog.

shark: Here is the Atlantic link:
http://www.theatlantic.com/doc/print/200911/brownlee-h1n1
My two posts here:
http://scienceblogs.com/effectmeasure/2009/10/journalists_sink_in_the_a…
and here:
http://scienceblogs.com/effectmeasure/2009/10/the_atlantic_article_sur_…

Don't have the BMJ link handy. Anyone?

First one never starts with no information. You would have animal data on toxicity and data on BP effect from the AED trials that led to FDA approval. You might also have structure activity information. The point here is that from a Bayesian standpoint this data adds to (or detracts from) your a priori belief that the drug has an effect.

Secondly you have bias issues. The researcher believes the drug will work and this will lead to biased measurements. pr-drug BPs will be accepted if high and on treatment bps will be manipulated down.

then you have placebo effect and regression to the mean effects.

If my a priori was high it might make it a little higher.

david: All fair points, although we are analyzing this from a more frequentist perspective because that is what EBM adherents emphasize in their interpretations and our objective is two fold: to look under the hood of randomized vs. non-randomized trials (call it a deconstruction if you want); and the second one will come at the end and we aren't divulging it yet. Our target is an unreflective belief in the RCT as a magic solution that trumps all others, a view we see quite a lot of these days.

David @ #41: Would you kindly provide any other link to your response. This (http://tinyurl.com/yevx7qb) URL is blocked in our country. Thanks.

http://www.evidenceinmedicine.org/2010/01/reveres-thought-experiment-an…

I enjoyed the discussion... a lot to think about. Observational studies or chart reviews are a a good starting ground or can be applied as a "pilot" for future RCT's and pharma companies will soon be chasing you if the data is replicated in a more superior study design.

Good luck with your work!

Thanks for the interesting discussions. I am currently working on a review paper that includes both observational and rct studies. I am curious - does anyone know of any paper that describes the natural progression of studies for research questions. If you look at many interventions, the first studies on the topic are observational (usually just descriptive) then there might be a few case control studies and finally someone does an rct. I've found tons on effect sizes and why choose a type of study but nothing yet on how there is typically a natural progression which seems in many cases appropriate.
Thanks for any guidance.

jeanelle: My own view is that there is quite a lot of variation. Just about everything we have discovered about what causes cancer in humans (I'm talking chemicals and work exposures primarily) was first discovered by an astute clinician, patient or parent, so it is in the nature of a case report or series. Some things are not amenable to an RCT (nor are they necessary) but for some things, like new drugs. RCTs are the initial step. So I don't think you can give a natural progression except in certain instances and then they will be different depending on the context or subject.

Randomized trial versus observational study challenge

More like this

Randomized trial versus observational study challenge, III: metaphysics

Randomized trial versus observational study challenge, VII: randomization, second part

Is evidence-based medicine "sufficient" for alternative medicine research?

Randomized trial versus observational study challenge, V: pre-randomization

A note tacked to the door

We bid you farewell

Freethinker Sunday Sermonette: summing up

Blog matters: who is "revere"?

Reading about the hazards of what I used to do as a youngster

Messier Monday: The Second Greatest Globular in Hercules, M92

This is why we must invest in ourselves!

Kantoi by Zee Avi