Satoshi Village the blog of Daniel Himmelstein

The preprint in 2015 and what comes next

2015 was a year of the preprint. While posting manuscripts prior to peer review and journal publication has long been practiced in physics, preprints are just catching on in the biosciences.

Last year, labs started universally preprinting, and preprints were billed as the solution to accelerating an ever more laborious publishing process.

To give some context, PeerJ PrePrints and bioRxiv both launched in 2013. Prior to 2015, PeerJ published on average 1.2 preprints per day compared to 2.4 for bioRxiv. Yet in 2015, PeerJ averaged 2.3 per day while bioRxiv averaged 4.9. Since together bioRxiv and PeerJ published only 2,626 preprints in 2015 while PubMed grew by 1,082,213 articles, the ~100% annual growth in preprinting could persist for the next several years.

And the incentives for preprinting will likely spur continued growth. In short, the benefits—establishing precedent and increased article lifespan—outweigh the disadvantages—ineligibility to a dwindling set of démodé journals. While preprint servers may not provide the same visibility as journal, they can still be influential. For example, my first preprint was featured in a review, and my second drove interest in the topic.

One final drawback is the Google Scholar bug, where the existence of a preprint prevents subsequent journal publication from appearing in the Google Scholar database. Despite discouraging openness and frequently occurring, the bug is considered a “feature” by Scholar’s creator, so don’t expect a resolution anytime soon.

The cruel twist is the longer the publishing delay, the greater the chance of triggering the Scholar preprint bug. For example, my first study on hetnets experienced a delay of 105 days from acceptance to publication on July 9th. Now, 179 days later, Google Scholar still only indexes the bioRxiv preprint. Double whammy!

While preprints are a workaround for excessive publication delays, their current incarnations are limited. There has been little preprint innovation since 1997. bioRxiv and PeerJ preprints are still PDF only, with no web display, although PeerJ appears to be aware of this problem given their creation of paper-now, a GitHub Pages solution to publishing.

Preprints bring us closer to open science, which has been defined as:

scientists sharing their research with the world as soon as they record it for themselves.

Preprints help counteract publishing delays, which are by definition anti-open. However, preprints are not a panacea! Withholding publication until a study is complete is a major obstacle towards greater scientific efficiency.

Feedback is most helpful early on, when mistakes are still rectifiable. And scientific outputs are often must useful when young, while still state of the art. Platforms such as Thinklab, RIO, and GitHub now allow collaborative open science from a project’s conception.

As an example, I’m performing my current project using Thinklab. In the last year, 22 community participants contributed feedback across 57 discussions. We’ve created lots of valuable content that’s been accruing views, citations, and page rank—none of which would exist had we withheld sharing until completing a preprint.

So while I wholeheartedly recommend preprinting, I encourage those interested in major scientific disruption to look further.

In closing, here are three recommendations for maximizing the potential of your science with preprints:

  1. Release your preprint under a CC BY license to encourage reuse and hence citation.
  2. Treat your preprint as a publication: all data and code needed to reproduce the study should be available and appropriately licensed.
  3. Make sure to update your preprint with revisions. Help readers stay up to date while gaining visibility on the recents list.