Science, Semantic Web and Excuses

Is the data that supports the claim in your last paper available? Are your ontologies available? Are the terms of your vocabulary dereferenceable? Where can I find your test dataset? Are you able to repeat your evaluation process?


If the answer to some of these questions is "NO", your work is not good enough. You cannot be considered a good researcher. We are not only talking about the LOD five stars [1], LOV metrics [2], AMOR principles [3] and all those semantic publishing best practices; we are talking about bigger problems. Beyond the semantic field, these problems go further into a scientific problem. Reproducibility, traceability and evidence trust are issues we normally face in our field but can be found in any scientific field.


So what is the problem? Is there a lack of tools and/or methods for publishing your experimental data? Although several approaches within the scientific community are encouraging people to publish research contributions in a more complete way and proposing some manners to do so, there is not a final standardized solution for doing so. But let’s be honest, if part of your work is not accessible it is not because of that, it is because of you.


You, as a scientist, a guardian and evangelist of semantic research, and therefore of the science in general, should be the one taking care of publishing a complete, coherent and understandable piece of work.


Then, why do we usually find incomplete pieces of work? Why is it so common to find a paper with a good-looking OWL-Viz or UML diagram depicting some parts of an ontology, but not a simple link to the corresponding OWL file from which the diagram has been generated?


If your claims about your evaluation are true, why aren't you including a more precise description of your input datasets, your code or your execution environment configuration? For someone that has spent such a long time thinking, designing and running an experiment it should be trivial to describe it.


Having to send an email asking for an ontology or dataset is a good way of improving your social life as a researcher, and doing some networking is always good. But it is not how science and scientific communities are supposed to work, specially when the authors have changed their institution or they just don’t answer.


The fear of exposing a work, the laziness, the limitations of tools for publishing a research work beyond a simple PDF and many others [4] could be some of the reasons of these problems. But for a good scientist these reasons should be nothing but excuses.


From NON reproducible to reproducible & preservable there is range of grades that the scientific semantic community should implement to evaluate the quality of any research work in the field. What we propose here is simple, a list of requirements for publishing vocabularies, experiments and software.


General requirements for vocabularies, experiments and tools in publications

  • Make the described resource available in the web: URIs for vocabularies, Research Object URI for your experiment or GitHub pointer for your software.
  • Specify a non restrictive license.
  • Produce documentation and annotate it with metadata to make it human and machine readable.
  • Provide examples: an example with instances of your vocabulary, a sample execution of the experiment or a demo of the software.

Specific requirements for vocabularies (derived from [1], [2] and [3])

  • Represent the vocabulary in W3C standards (RDFS and OWL).
  • Make it dereferenceable using content negotiation.

Specific requirements for scientific experiments

  • Make the inputs, outputs and intermediate results available for, at least, reviewer examination.
  • Define a scientific workflow representing the method described in your experiment and publish it.
  • Describe the tools used for running it, publish configuration files, execution dependencies, logs and provenance of the experiment (semantically annotated).

Specific requirements for software

  • Publish a link to where the system can be accessed.
  • Define documentation on usage, deployment and produce a demo.
  • Publish your source code and the tool dependencies.


Have you got any feedback or comments? Share them with us in this thread!

Sample Research works checklist

This work is in progress...

About the authors

Idafen Santana Pérez is a PhD student at the Ontology Engineering Group (Universidad Politécnica de Madrid)
Daniel Garijo is a PhD student at the Ontology Engineering Group (Universidad Politécnica de Madrid)
Oscar Corcho is an Associate Professor and researcher at the Ontology Engineering Group (Universidad Politécnica de Madrid)

Acknowledgements

The authors would like to thank Raul Garcia-Castro for his comments, and María Poveda Villalón and Miguel Ángel García Delgado for their help with the HTML formatting and publishing.