Woroni(Un)Healthy Research: A Dilemma of Ethics and the Publication Bias - Woroni

When was the last time you read research reporting failed results?

As worthless as they may seem, negative results are a fundamental component of science. Some hypotheses simply turn out not to be true after investing months of hard work.

But what are your options after a negative result? Publishing is an option, and the result will undoubtedly help future researchers who may think of similar or alternative hypotheses. Unfortunately, it is near impossible to get a negative result published in a major journal or conference. Interestingly, some journals exist solely to publish negative results but, as expected, their papers get minimal exposure and citations, making them difficult to discover without a high level of Google-fu.

Let’s step back a bit and understand what a negative result is, and why the reporting of a negative result is important for upholding the evidential rigour of research.

Let’s say you had the hypothesis that left-handed people are less likely to develop Parkinson’s Disease (PD). From surveying 10,000 people, you find that 0.5 per cent of left-handed people have PD compared to 0.6 per cent of the right-handers. It’s not enough to say that the hypothesis is true, as it is possible that these numbers are purely from chance. Assuming that of the 10,000, we have 1,000 were left-handers and 9,000 right-handers, that leaves five left-handers with PD and 53 right-handers without PD. If you happened to stumble upon another left-hander with PD, that would have swung your result from positive to negative!

A hypothesis test is what is used to either confirm or deny your hypothesis. There are many different types of hypothesis tests, and the goal is to check if the results that you’ve obtained are statistically significant and are not just from random coincidence. A statistically significant result which affirms a hypothesis is ‘positive’, whereas non-statistically significant tests are ‘negative’. Being a negative result indicates that the outcome attributes to random chance.

You may have heard of the p-value test, which is the most common hypothesis test used throughout the sciences. The test is easy to apply and, when used correctly, it’s a great measure of whether your results are indeed statistically significant.

However, the p-value test is not without flaws. A major criticism is that researchers will often accept them as the sole measure of a hypothesis’s validity. This approach also neglects arguably more important factors such as the design of the study or an analysis of the results in favour for raw results. Another core issue is that if you repeat an experiment enough times, you may end up with a p-value which is statistically significant.

This leads to a pervasive problem known as publication bias: research may be repeated multiple times, often by different researchers who have the same idea, to obtain a positive result. This positive result is the only result published, despite other potential results which may negate a hypothesis. While a p-value seemingly affirms the hypothesis, the actual research may remain highly objectionable.

Another problem which leads to publication bias is p-value hacking. This is an unethical way of getting a couple of publications out of a dataset.

Rather than starting from a hypothesis, the researcher starts with the raw data and tries to find interesting correlations which are statistically significant. This is like writing an aim of a test after conducting the experiment, and can lead to ridiculous correlations. Examples include the high correlation between the divorce rate in the state of Maine and the per capita consumption of margarine in the United States, along with the number of mathematics doctorates awarded in the United States compared with the uranium stored in United States nuclear power plants.
All of this leads to bias in researchers, who manipulate statistics and research to affirm a hypothesis. Ultimately, plaguing research with fundamental statistical and evidential flaws because a research journal publishes this positive hypothesis.

How big of a problem is this?

Manipulation of statistics and evidence is much more problematic in some fields relative to others. A study replicating 100 previous works in psychology found that only 36 results were statistically significant after replication, highlighting the apparent widespread statistical flaws in psychology research. However, this remains a fairly controversial piece with criticisms such as failures to follow original methodology perfectly. Nevertheless, it is undeniable that a hypothesis may not be rigid if replication can only occur under a very specific circumstance.

The problem of statistical and evidential flaws within research still stands, which is propagated by the bias of research publications to only publish positive hypothesis. As with the example with published works in psychology, this may lead to a substantial portion of literature becoming statistically and academically flawed.

So what are people doing?

Recently, the American Statistical Association has recommended against the use of p-values, promoting the Bayes Factor. Simply put, this is the ratio of the likelihood of the probability between two competing hypotheses, which is usually the testing hypothesis, and an alternative hypothesis.

Hypothesis testing is still a field of debate in statistics. However, this move by the American Statistical Association presents a fruitful step in increasing the rigour of the statistics used in research. A Bayes Factor presents advantages over other statistical tests, such as increased interpretability. It may also be more robust to ‘overfitting’, which assigns an undue degree of complexity to results which may only allow for a more simplistic hypothesis.

A mass literature review was recently conducted using Bayes Factors to analyse the results of 35,000 papers in psychology. Based on the stats reported in the papers, over 27 per cent did not reach the level of ‘anecdotal’ evidence, and 45 per cent did not achieve ‘strong’ results. From testing using Bayes Factors, the review concluded that the general threshold of statistical acceptance for psychological findings is set too low since a substantial proportion of published results had weak statistical and evidential support.

Another initiative being undertaken to prevent the publication of evidentially poor hypotheses is that some publications will only accept a research abstract for publication before determining the results. This prevents academics from using p-value hacking to publish data which affirms a given hypothesis outright.

Where to from now?

Fundamentally, the problem of publication bias lies with the publish or perish mentality of research, and of the bias by publications to only publish positive results.

It is no longer beneficial or even possible to invest time in outwardly far-fetched hypotheses which are unlikely to reap the rewards until far into the future. Instead we find more niche, somewhat trivial papers which offer only minor improvements to current research, since these are more feasible and make up a more impressive research portfolio on paper. In these cases, it’s easier to find a result which accords to a positive hypothesis.

Research shouldn’t be so dismissive of negative results. While rigorous statistical measures can be employed to stifle the publication of evidentially unsound conclusions, it is ultimately up to publishers to uphold the rigour of academia by admitting the value of negative hypotheses.

We acknowledge the Ngunnawal and Ngambri people, who are the Traditional Custodians of the land on which Woroni, Woroni Radio and Woroni TV are created, edited, published, printed and distributed. We pay our respects to Elders past and present. We acknowledge that the name Woroni was taken from the Wadi Wadi Nation without permission, and we are striving to do better for future reconciliation.

(Un)Healthy Research: A Dilemma of Ethics and the Publication Bias

Something else?

Making Waves

Universal Truths and Another Way

How Can You Mend a Broken Heart?

Decoding ChatGPT: What lies below the surface and what lies in the future?