Significance of viral trials

Zelsuvmi or berdazimer was approved in January for the treatment of molluscum contagiosum. My 1 yr old daughter acquired this “pox” in daycare. Intrigued, I took a look at the clinical trial results.


The molluscum contagiosum virus (MCV) makes very small bumps, is benign and usually self-resolves without any scarring in 6 to 9 months with no treatment at all. But that’s still a long time for a parent, especially if the bumps are on their child’s face. So, there’s real demand for treatment (clinically necessary or not). And too, the extent of these pox bumps can be considerable in the immunosuppressed.

Novan ran the clinical trials. The ones I boxed in green are significant. The one I boxed in red was simply not significant, and would not be significant unless there were 1,000 test and 1,000 control subjects.


The one I boxed in orange is not significant, but would have been had they used as many control subjects as test subjects, assuming the proportions stayed the same. For some reason they chose to use half the number of controls and may have screwed themselves out of a significant finding.

When my daughter had a small molluscum bump on her face back in 2004, we agonized over whether to treat or not. The treatment was Cantharidin, a vesicating agent from blister beetles, and oddly enough, the active ingredient in the aphrodisiac Spanish Fly! The whole ordeal I chronicled in my book.

The efficacy of cantharidin is highly variable from trial to trial.

This variability seems to be mirrored to some extent in the 3 trails of Zelsuvmi.

But, in double-blind randomized clinical trials, 10% potassium hydroxide (KOH) cleared lesions in 60% of MC patients in a shorter treatment period (60 days) than was shown for Zelsuvmi (32% in 12 weeks for Zelsuvmi).


And, you can get 30 ml of KOH from Amazon for $18 without a prescription.

Only Priority Review requires that a drug be an improvement over existing therapies. In Standard Review, it’s not the FDA’s job to compare whether a new drug is any better than what’s already on the market. They only assess safety and efficacy.

Phenylephrine is a good example.

The “decongestant” phenylephrine is safe, but last year the FDA finally agreed with 20 years of research that proved that this, the most common decongestant in consumer products, is useless.

But they still haven’t ordered its removal from the market. If you’d like to know why, this article in Fast Company is a great read.

Power

The loss of power in clinical trials that use unequal treatment and control groups is under-appreciated. Before you gloss-over when looking at the equation below here just focus on the red P and the green N.


Novan’s Trial 2 just-missed significance. I noted that “For some reason they chose to use half the number of controls and may have screwed themselves out of a significant finding.”

The rationale for a 2:1 split varies. Sometimes it’s a motivator to get people to join the clinical trials: “We have a promising therapy for your child’s molluscum contagiosum. It’s already been shown to be safe. Would you care to join a clinical trial where your blemished darling daughter has an even chance of getting the treatment? No? What if she had a 2:1 chance of getting the treatment?”

These uneven splits have become more common in oncology trials.

But this is known to reduce the power of the outcome (and see this).

The minimal detectable effect (MDE) is the minimum effect in the treatment group that has to exist if we are going to have a hope of finding a difference from the control group. This value changes not only in relation to the total number of participants (as N increases it divides the effect of the variance more) but also in relation to how those participants are split into treatment and control groups.

Look at the red P in the equation. This is the proportion in one group and 1-P is the proportion in the other group.

If the split is 50% to 50%, then 1/[P x (1 – P)] is 4. But if the split is 2:1 (67% to 33%), then this value is 4.5. So, the MDE goes up by a factor of 6% (i.e., sqrt (4.5) / sqrt (4) = 1.06 ) demanding a larger effect.
A more fulsome treatment is here

If we look at the Zelsuvmi Trial 2, with 355 participants, the MDE for their 2:1 split would have been 31.3% and they didn’t get there with 30%. Had the trial been evenly split with exactly the same number of participants, the MDE would have been 29.7%, and they did exceed that.
 
Not only would they have exceeded the MDE using an even split, the 10% difference between treatment and control groups also would have been significant (p-value = 0.04).

Mind you, if these differences in sample size and group proportions will effect the outcome by being marginally worse or better than 0.05, it begs the question as to whether the dogmatic reliance on alpha=0.05 has much more than arbitrary meaning.

A better way

Can you combine the results of clinical trials? Why sure you can. And you can do it without a meta-analysis if the trial populations are homogenous.

Trail 2 by Novan wasn’t significant.
Or was it?

The results reported to the FDA showed three trial groups with different numbers of participants and different proportions of test and controls.

But there’s nothing to suggest that the participants in these multi-center trials weren’t cross-sectionally the same. Indeed, they are declared to be the same.

This is a balls-in-urns problem.


The parametric z-test that is regularly applied to proportions is typically used. Since the z-distribution is Gaussian-normal, it’s a best-fit when the proportions are closest to 50%. Similarly, in terms of measuring the differences in proportions, the closer those differences are to each other, the more there can be a slip away from normality.

But this can be circumvented by just randomly subdividing the total population of “cured” and “uncured” into the three trials, and then randomly subdividing those into the sizes of the test and control groups.

Six urns of the sizes in the three clinical trials and their respective test/control splits. Then ask, how often in 10,000 such permutations is there a difference in “cured” between test and control that is as great or greater than that seen in the observed data.

Easy enough to code in Python. Which I did.

For Trial 1, evenly split 444:447, no randomization achieved the same difference, so p < 0.0001.

For Trial 3 , unevenly split ~235:117, p=0.313, not significant.

But here’s the kicker, for Trial 2, not found to be significant using the parametric z-test, when using this randomization approach it is significant at p = 0.0272. That is, only 272 of 10,000 randomizations achieved a difference as great or greater than that seen in the observed data. Even if you just apply this approach to Trial 2 on its own, the p-value is 0.0337.

Now… whether FDA would accept this approach is another matter.

Scroll to Top