Research Software Engineer (REDMANE / Data Commons)

Introduction

As part of our ongoing efforts to evaluate and implement innovative data management strategies, we are offering an internship opportunity to work on a pre-production project for a data commons called REDMANE. We are seeking a motivated and detail-oriented intern to join our team and contribute to the development of a pre-production data infrastructure.

The ideal candidates will leverage their backgrounds in full-stack development, Linux command line / system adminsitration, or bioinformatics to contribute to a multi-disciplinary team.

Multiple sub-projects within this project

This project is made up of the following sub-projects that an intern can apply for:

Create synthetic multi-omics data to match synthetic clinical and other metadata
Extend functionality of the Data Registry application in ReactJS and FASTAPI
Ingestion of data and metadata into the Data Registry using Python, and authenticating to an API
Setup standardisation in authentication and security for multiple Data Portals using OIDC, AAF, and KeyCloak
Setup cBioPortal as a Data Portal on the Nectar Cloud secured with OIDC
Setup generic secure Shiny/R App as a Data Portal on the Nectar Cloud secured with OIDC
Setup Omero as a Data Portal on the Nectar Cloud secured with OIDC
Setup Storage Calculator as a Data Portal on the Nectar Cloud secured with OIDC

Duties while on placement

As a Research Software Engineer Intern, you will play a crucial role in supporting the design and implementation of a pre-production Research Data Management ecosystem called REDMANE. This internship will provide you with valuable hands-on experience in trialling, analysing, and improving different platforms. You will assist in building a scalable data management infrastructure and gain exposure to various aspects of data integration, modeling, and governance.

The Research Software Engineer Intern role will generally:

Assist in setup, extending, and testing Data Registry and Data Portals using synthetic data for multiple REDMANE ecosystems.
Support the implementation of data ingestion pipelines for Data Registry and Data Portals securely.
Contribute to the development of a base set of requirements for Data Portals to be able to be part of the REDMANE ecosystem.
Setup environment to provide authentication across the Data Registry and Data Portals.
Setup Data Portals for existing applications such as cBioPortal, Shniy, Omero and custom-made Storage Calculator.
Assist in documenting the design decisions, and implementation processes.
Contribute to creating technical documentation and user guides for future reference.
Stay updated on emerging data management technologies and industry trends.
Explore and experiment with new tools, frameworks, and platforms that can enhance the implementation.

Skills and Pre-requisites

We are aiming to build a multi-disciplinary team where each person would have at least one of these skill sets:

basic knowledge or ability to learn quickly ReactJS/Python/FastAPI/Javascript/CSS and Linux command-line
basic knowledge or ability to learn quickly high-level bioinformatics
basic knowledge or ability to learn quickly high level Linux command-line/system administration
basic knowledge or ability to learn quickly organisational, project management, and communication skills

All people in this multi-disciplinary team should:

have the ability to synthesise data, understand software usability, and critically analyse code to identify and address performance issues,
be motivated to work alongside a diverse team, sharing ideas and tackling challenges collectively,
be able to tolerate or learn to tolerate complexity, and
be able to tolerate or learn to tolerate ambiguity.

Discovery Internship Program

Research Software Engineer (REDMANE / Data Commons)

Introduction

Multiple sub-projects within this project

Duties while on placement

Skills and Pre-requisites

Further reading