Worst. Prisoner's Dilemma. Evar.

Hostage Situation:

Even if you're overoptimistic by a factor of five and it's only a 20% savings we'd hire you tomorrow to build that for us. You can have a plane ticket to wherever you want to work and the best hardware money can buy and real engineering support to deploy something you've already mostly built and proven. [...]

And we look at that and say: what if you've got nothing? How can we know, without something we can audit and test? Of course, all the supporting research is paywalled PDFs with no concomitant code or data either, so by any metric that matters -- and the only metric that matters here is "code I can run against data I can verify" -- it doesn't exist.

Those aren't metrics that matter to you, though. What matters to you is either "getting a tenure-track position" or "getting hired to do work in your field". And by and large the academic tenure track doesn't care about open access, so you're afraid that actually showing your work will actively hurt your likelihood of getting either of those jobs.

So here we are in this bizarro academic-research standoff, where I can’t work with you without your tipping your hand, and you can’t tip your hand for fear I won’t want to work with you. And so all of this work that could accomplish amazing things for a real company shipping real software that really matters to real people – five or six years of the best work you’ve ever done, probably – just sits on the shelf rotting away.

Previously, previously.

Tags: , , ,

20 Responses:

  1. relaxing says:

    Oh yeah, paywalls bad, information wants to be free, minus 10 points for implying university research should be treated like undergraduate assignments, and cooperation with this guy sounds impossible much in the same way it is with rms.

    I wouldn't blame any academic for performing the same calculus and deciding following up with him wasn't worth the risk.

  2. Vince says:

    I'd estimate at least 75% of CS / Computer Engineering academic publications are awful unreproducible junk due to horrible methodology and the fact that no one (or at least no one important) ever calls anyone on it. I'm impressed he actually believes that if he had their source code it would actually do anything worthwhile.

    I always release full source code and full results when I publish a paper, but that doesn't happen very often because I get realistic (rather than amazing fantastic) results so usually it gets rejected for my work being "incremental".

    • jwz says:

      By your estimation, 25% of it might be useful. Whereas we know for sure that 100% of unavailable code is unhelpful.

      Also: if you have the working theory that the purpose of academic research is to prove things (rather than the purpose being "get tenure") -- that is, if you believe that PhD programs have anything whatever to do with science -- then having other people try to reproduce or falsify your results is, you know, kind of critical to the whole process.

      • Vince says:

        I initially had 90% but dropped it to 75% to account for researchers having good results accidentally, and also to account for companies that bother to publish results (they can be a bit immune to the academic nonsense so tend to have solid results even if they're even more adverse to publishing source code).

        I agree that 100% of unavailable code is unhelpful. I assume you've seen the quality of code drops from academia before so you know that having access to the code can often only be a marginal improvement over no code at all.

        I wish that the computer field were treated more like a proper science. You see all the articles appear in retractionwatch and notice not a single one is computer related. And I don't believe for a second that somehow CS people are that much more ethical than medical researchers.

        • mhoye says:

          I initially had 90% but dropped it to 75% to account for researchers having good results accidentally,

          Nothing says "I am making a good-faith argument about the integrity of the scientific process in an academic setting" like completely unsupported, entirely-made-up fractions of small whole numbers.

  3. nooj says:

    So here we are in this bizarro academic-research standoff, where I can’t work with you without your tipping your hand, and you can’t tip your hand for fear I won’t want to work with you.

    Sounds like every first date I've ever been on. Also my day job.

    • tobias says:

      in a real world situation of this ilk, the brave soul who wants a successful outcome tips their hand a tiny bit to make progress.

  4. Mike says:

    So you sign an NDA and send a team of guys to inspect the code in a developer-controlled cleanroom. They can take their datasets in and the results out, but nothing else. Standard practice?

    • relaxing says:

      Yep. But this guy can't pony up $20 to read the paper, so going the rest of the way so aboveboard seems out of the question.

      • mhoye says:

        I'm at the conferences. I get to read the papers there, and see the person presenting them, and talk to them about it. That's what academic conferences are for.

        And, sure, I could spend $20 to $50/paper reading all the literature you've cited - which could easily amount to more tha$1000, if you've got a dozen or two references notwithstanding the cost of my time - just to try and gauge if what you're saying is plausible

        Or, you know, that person could publish their code, so that I could actually have some confidence I'm making a good investment.

        • NotTheBuddha says:

          And, sure, I could spend $20 to $50/paper reading all the literature you've cited - which could easily amount to more tha$1000, if you've got a dozen or two references notwithstanding the cost of my time

          The direct dollar cost could be reduced by subscriptions to article collections, and the time cost could be reduced by having a field-dedicated expert do the evaluation. You could possibly roll both into one by having some kind of internship or fellowship for doctoral candidates, who already have free article access through their university and spend their first couple of years becoming the field experts.

    • Owen W. says:

      Who has the time and money to fund a "team of guys" to waste their time looking over some code that may or may not work? I thought one of the tenets of scientific research was making it possible for other scientists to reproduce your results.

      • relaxing says:

        Businesses in the business of spending their business money so that they can make more money. In business.

        Also government agencies, for reasons which are less clear.

        • jwz says:

          "I thought you guys were supposed to be just buying things!"
          "I thought you guys were supposed to be doing actual science!"
          "Hey, you got your dysfunction in my peanut butter!"

  5. Edouard says:

    In an unprecedented turn of events, the comment at the bottom of the article is completely accurate: "it turns out that the claims [...] are based on very carefully controlled environments and using code that is [...] riddled with flaws".

    Actually, having written that, I guess it's more or less identical to the marketing of almost all commercial software. Perhaps there is more synergy between academia and the commercial world than there at first seems...

  6. robert_ says:

    As the saying goes, claims without supporting evidence are just opinions.

  7. Sheila says:

    An invitation to reproducible research

    Biostat (2010) 11 (3): 385-388. doi: 10.1093/biostatistics/kxq028

    'I was inspired more than 15 years ago by John Claerbout, an earth scientist at Stanford, to begin practicing reproducible computational science. See Claerbout and Karrenbach (1992). He pointed out to me, in a way paraphrased in Buckheit and Donoho (1995): “an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.”'

    D. Donaho

    • NotTheBuddha says:

      The actual scholarship is the full software environment, code and data, that produced the result.

      Right on, and today we can begin to publish some of this in usefully inspectable form with Django and cloud services and such.