Research Software Engineer (REDMANE / Data Commons)
Introduction
As part of our ongoing efforts to evaluate and implement innovative data management strategies, we are offering an internship opportunity to work on a pre-production project for a data commons called REDMANE. We are seeking a motivated and detail-oriented intern to join our team and contribute to the development of a pre-production data infrastructure.
The ideal candidates will leverage their backgrounds in full-stack development, Linux command line / system adminsitration, or bioinformatics to contribute to a multi-disciplinary team.
Multiple sub-projects within this project
This project is made up of the following sub-projects that an intern can apply for:
- Create synthetic multi-omics data to match synthetic clinical and other metadata
- Extend functionality of the Data Registry application in ReactJS and FASTAPI
- Ingestion of data and metadata into the Data Registry using Python, and authenticating to an API
- Setup standardisation in authentication and security for multiple Data Portals using OIDC, AAF, and KeyCloak
- Setup cBioPortal as a Data Portal on the Nectar Cloud secured with OIDC
- Setup generic secure Shiny/R App as a Data Portal on the Nectar Cloud secured with OIDC
- Setup Omero as a Data Portal on the Nectar Cloud secured with OIDC
- Setup Storage Calculator as a Data Portal on the Nectar Cloud secured with OIDC
Duties while on placement
As a Research Software Engineer Intern, you will play a crucial role in supporting the design and implementation of a pre-production Research Data Management ecosystem called REDMANE. This internship will provide you with valuable hands-on experience in trialling, analysing, and improving different platforms. You will assist in building a scalable data management infrastructure and gain exposure to various aspects of data integration, modeling, and governance.
The Research Software Engineer Intern role will generally:
- Assist in setup, extending, and testing Data Registry and Data Portals using synthetic data for multiple REDMANE ecosystems.
- Support the implementation of data ingestion pipelines for Data Registry and Data Portals securely.
- Contribute to the development of a base set of requirements for Data Portals to be able to be part of the REDMANE ecosystem.
- Setup environment to provide authentication across the Data Registry and Data Portals.
- Setup Data Portals for existing applications such as cBioPortal, Shniy, Omero and custom-made Storage Calculator.
- Assist in documenting the design decisions, and implementation processes.
- Contribute to creating technical documentation and user guides for future reference.
- Stay updated on emerging data management technologies and industry trends.
- Explore and experiment with new tools, frameworks, and platforms that can enhance the implementation.
Skills and Pre-requisites
We are aiming to build a multi-disciplinary team where each person would have at least one of these skill sets:
- basic knowledge or ability to learn quickly ReactJS/Python/FastAPI/Javascript/CSS and Linux command-line
- basic knowledge or ability to learn quickly high-level bioinformatics
- basic knowledge or ability to learn quickly high level Linux command-line/system administration
- basic knowledge or ability to learn quickly organisational, project management, and communication skills
All people in this multi-disciplinary team should:
- have the ability to synthesise data, understand software usability, and critically analyse code to identify and address performance issues,
- be motivated to work alongside a diverse team, sharing ideas and tackling challenges collectively,
- be able to tolerate or learn to tolerate complexity, and
- be able to tolerate or learn to tolerate ambiguity.
Further reading