In April 2015, my roommates and I participated in a Hackathon in Dublin City University. They were PhD students at the Adapt Centre and their work focused on Natural Langugage Processing for search and translation. At that time, I was a full-time Software Engineer at IBM and a part-time Master’s student in Data Analytics at the same university.
In Machine Learning, in order to train a learning algorithm you need many samples and annotations. In short, the more data the better, particulary if you work with text and you want to apply Deep Learning techniques. For text annotations, researchers typically use third party services like Amazon Mechanical Turk, CrowdFlower or Appen for tasks like machine translation, topic extraction or relevance.
What if students were to do those annotations? Our idea was to create a portal for collaboration between students and researchers that will allow students to gain some research experience while getting paid and give research centres on-campus human resources for these small research tasks currently outsourced. On top of that, you can identify students with an interest in research and retain that talent in that university.
Our pitch at the Hackathon can be found here. The president of our university really liked the project as there had been efforts to encourange students to participate in research activities. We did not win but we got ourselves a place in the Ryan Academy 2015 Summer Accelerator Program where we were given 5,000 EUR, accommodation and mentoring from Academia and Industry leaders.
“Students for Research” was a system that would extract the skills of students from the courses they have mastered at our university and the syllabous of those courses. A matching algorithm would suggest candidates for particular tasks to researchers and also tasks to students using Apache Lucene. We also included a scoring system to support this ranking based on previous tasks’ performance.
The code will be posted on my Github soon for anyone to use.