Data Engineer (f/m/d) (Stage/Alternance)
ArcaScience

Data Engineer (f/m/d) (Stage/Alternance)

ArcaScience
  • Internship (6 months)
  • Paris (France)
  • Published on September 2 2021


ArcaScience is a leading french Startup commited towards solving some of the biggest health challenges of our time,  with a strong focus on building collective intelligence based on biomedical and scientific data by means of advanced machine learning technologies. Our customers are high-profile pharmaceutical companies and leading public research institutes. What we do, we do it for and by the research.

Opening Science and creating new synergies, collective intelligence and empowering R&D in our troubled world are what we call our mission. 

Helping cure covid, cancer, rare diseases, and participating in the pharmacovigilance are our main missions. 


Job description (What we are looking for)


We are looking for a Data Engineer to help improve and extend ArcaScience’s AI-based search engine. You should have an engaged personality, used to work proactively and you are passionate about your work. You enjoy broadening your horizon in order to master new technologies and like to expand your skills continuously.


Core responsibilities (You will be)

  • Analysis of the needs of different big pharma customers and translating them into annotation schemes suiting the nature of target data
  • Piloting the collection and management of open and internal biomedical data
  • Resolving issues related to the processing of text in heterogeneous repositories and databases
  • Data engineering and annotation for advanced natural language processing (NLP)
  • Assessing the requirements of state of the art Deep Learning models for text classification and information extraction
  • Scaling-up the performance and execution of NLP modules and Machine Learning models in production

Requirements & Qualifications (You have)

  • Teamwork spirit
  • Decent English. A Fluent French is a great plus
  • Experience with Python and/or Java languages
  • Familiarity with issues related to processing unstructured, semi-structured and structured data: e.g., plain text (.txt, .docx), datasheets (xlsx, csv, tsv), XML data, etc
  • Knowledge of RESTful architectures and implementations
  • Basic usage of source control technology: Git, github and gitlab

Plusses (Would be nice, if you have)

  • Familiarity with Natural Language Processing topics (e.g., pos tagging, named entity recognition, entity linking)
  • Familiarity with data annotation workflow and frameworks
  • Knowledge of machine learning technologies and platforms (e.g., Spacy, Pytorch, Tensorflow..)
  • Open science culture
  • Entrepreneurial culture
  • Availability for afterwork drinks


Advantages (What we do offer)

You will be working in a dynamic environment with  

  • R&D driven engineering: working on cutting-edge AI technologies to accelerate finding, improving and comparing cures and vaccines
  • Dynamic issues: pushing the limit of your knowledge with constantly new challenges
  • Interdisciplinary co-workers: engineers, scientists, biologists..
  • Talented engineers
  • Young and multicultural team
  • Flexible working hours with the possibility to work from home
  • Rewarding remuneration: depending on your profile, your engagement and efficiency


If you want to participate in improving a next-generation groundbreaking biomedical R&D solution, take a shot and apply here ! 


Post-internship hiring highly possible.