This job has expired

Data Scientist

London (Central), London (Greater)
41,386 - 48,414 per annum, including London Weighting Allowance.

View more

Faculty Jobs
Health & Medical, Allied Health, Professional Fields
Position Type
Employment Type
Full Time
Institution Type
Four-Year Institution

Job description

The Department of Twin Research & Genetic Epidemiology holds 30 years of data gathered from various sources on over 15,000 participants of its TwinsUK cohort. It is one of the most deeply characterised adult twin cohorts in the world, providing a rich platform for scientists to research health and ageing longitudinally.

TwinsUK has recently applied for access to the study participants health, educational and environmental records so that they can be linked to the vast collection of longitudinal omic and phenotypic data amassed in the past 30 years, leading to a huge and centralised resource of health research data.

In addition in response to the COVID-19 pandemic, TwinsUK joined the Government funded National Core Studies (NCS) programme and became an active member of the UK Longitudinal Linkage Collaboration (UK LLC). As a result, TwinsUK is engaging in a national effort to enable data linkage to study participants official health, educational and environmental records.

The postholder will have a pivotal role in receiving, cleaning, harmonising, documenting, storing, and curating these linked records and the data collected by TwinsUK. They will be responsible for creating automated or semi-automated tools to process these linked data and making them available to approved, bona-fide researchers within a Trusted Research Environment (TRE) and in segregated project spaces.

The postholder will be part of a highly collaborative and inclusive team, working under the supervision of the PI and Head of Data and working closely with the Data Manager. Proficiency in the use of programming tools for data and database manipulation is essential. The applicant will provide high quality general operations coordination & support to the Data team, the PIs and wider professional services team in the Department and School.

Applicants will have a strong interest & knowledge of clinical research data, a proactive attitude and excellent organisational & planning skills.

The postholder will:

Be familiar with ETL techniques in order to receive, extract, manipulate, clean and process raw research data to be harmonised with other datasets and made ready for sharing with researchers

Have high level expertise in data manipulation tools such as Microsoft Excel, SPSS, Stata and be able to write macros and scripts

Be proficient in the use of programming tools such as Python and R to automate data manipulation tasks

Be able to use the Azure platform and MS SQL Server DBMS to store data and automate data extraction tasks for other data team members

Facilitate the development and implementation of data linkage strategies and processes

Produce descriptions and define metadata for new and existing datasets

Be required to program data extraction out of and data injection into online study databases designed in REDCap

Be familiar with the need for and the usage of Trusted Research Environments (TREs)

We value your professional growth, and you will have opportunities to attend conferences & training. This post is based at St. Thomas Hospital but a hybrid working environment will be offered if required.

This post will be offered on an a fixed-term contract for 18 months

This is a full-time post - 100% full time equivalent

Key responsibilities

Using ETL techniques, receive and interrogate large amounts of data from different sources, ensuring accuracy and consistency, and storing in the TwinsUK database.

Use advanced skills in MS Excel, SPSS and Stata, including writing macros and developing scripting techniques.

Use programming tools such as Python and R to automate data manipulation tasks as required by the wider data team

Use the Azure platform and MS SQL Server DBMS to store data and automate data extraction tasks

Participant and clinic-based data collection Provide tools to aid data collection, injection and extraction to and from TwinsUK online REDCap databases

Look for data trends and patterns with excellent attention to detail

Help develop procedures to acquire and integrate the official records into the TwinsUK databases as per the guidelines agreed with NHS Digital and the ONS

Help with the administration of data in TREs for researchers. Develop procedures, tools and/or scripts to check export data files and ensure no identifiable data is exported when the researcher requests output of resultant data

Handle personal information, adhering to safe and secure data governance, in line with protocols and current data protection legislation

Identify and gather the metadata for the official health, education and environmental records

Draft & prepare reports and presentations where appropriate

Maintain excellent internal & external working relationships

Promote collaborative work within the data linkage team, the wider department, and other collaborating organisations

Attend wider data linkage meetings and workshops with external collaborators, such as the UKLLC, a multi collaboration of national cohorts.

Travel to other cohorts/research sites for meetings as necessary

Design & deliver presentations and progress reports including recommendation and conclusion, advising the manager where necessary

Be able to adapt communication style according to the given audience, demonstrating comprehension and confidence

The above list of responsibilities may not be exhaustive, and the post holder will be required to undertake such tasks and responsibilities as may reasonably be expected within the scope and grading of the post.

Skills, knowledge, and experience

Essential criteria

1. Educated to UG/PG degree level or equivalent relevant experience

2. Working knowledge of data manipulations tools with high level expertise of the following: MS Excel, SPSS, Stata

3. Experience in ETL techniques and methods

4. Working knowledge of Azure and DBMS such as MS Access with high level expertise in MS SQL Server including writing queries, views and stored procedures

5. Working knowledge of programming tools such as R and Python for the automation of data processing and analysis tasks

6. Experience of using REDCap

7. Highly numerate with strong analytical, statistical and problem-solving skills

8. Experience of data analysis, with ability to produce, interpret, analyse and present qualitative and quantitative data, using a range of techniques, including visualisation methods that improve understanding of the evidence base

9. Understanding of and adherence to data governance and confidentiality legislation and practice

10. Excellent written and oral communication skills, including report/protocol writing and presenting, to convey complex information to a non-specialist audience through clear and accessible formats

11. Highly personable with experience of working with stakeholders at all levels, using excellent interpersonal skills, providing excellent customer service and building effective networks across complex organisations

12. A commitment to equality, diversity and inclusion, actively addressing areas of potential bias

Desirable criteria

1. PhD in relevant discipline

2. Prior experience in project coordination

3. Prior experience in working in Trusted Research Environments (TREs)

4. Previous experience in health-related research projects

5. Knowledge of data visualisation techniques using software packages, such as Power BI and Tableau

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert