‘Old’ data

It use to be that I’d be surprised when I’d read a news article that mentioned that the data for the new breakthrough or revelation or idea was two (or more) years old. I’d wonder what the researchers had been doing all this time. Why hadn’t they analyzed it and published it sooner? Why the huge hold up? And then I went to grad school.

I published results from my Master’s thesis about a year after I finished. I’m working on a paper where the data from my experiments started being collected a year ago. And we just finished collecting last week. Another experiment likely to be in the paper has data that is closer to two years old.

So why the huge wait? Well, it’s a combination of stuff for me. But I’m sure the reasons are quite common among researchers.

  • The conference you want to submit to is not accepting papers for another x number of months. Sure, you could submit somewhere else, but depending on you (and your supervisor) there may not be another place that’s ideal or even just okay in the mean time. Which means you wait. 
  • The process for publishing in journals is long. It can take a year from original submission to final publication. This is what happens when you need to give people time to review, and then to re-edit, and then possibly review again and so on. Things take time.
  • You come up with a new hypothesis and realize that you have old data that may be able to answer your question. It’s always cheaper to re-use then to have to collect more – in both time and money. So you pull up the results and see what you can find now.
  • Collecting data, especially from participants, can be a long process. It maybe that your experiment doesn’t take long, but in order to get the number of participants you want does. Or, maybe you’re collecting data over a long time period (weeks, months, years).
  • While the data is interesting, and something probably worth publishing, it’s just not as high up your priority as something else you’re working on. So it ends up being put to the side until you either find time to work on it, or it just gets lost forever.
  • You’ve got a pile of data, and it looks like there’s interesting results, but you’re still trying to track down someone who understands the right statistical tests. And how exactly to analyze your data. This is the “oops, I forgot to think about analysis when I designed my experiment.”

I think a lot of the delay can be ‘blamed’ on the publication process. But, while I’d also like to see earlier results get out there, I do agree that the peer review part of the process can be (and is) very important. So it’s not so easy to fix. Especially with more controversial results (like in medicine) where an early result that hasn’t been fully vetted can end up causing problems if it turns out to be incorrect.

Do you find that some of your data just sort of sits there until you force yourself to act on it? What do you think is the normal length of time between data collection and publication for your work?


Join the discussion

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s