Company:
Oliver James
Location: San Francisco
Closing Date: 04/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description
Location: San Francisco, CA (Hybrid)
Qualifications:
- Master's degree or PhD in related field.
- Proficient in Python.
- Strong background in Software Engineering.
- Meticulous in preventing and catching data mistakes.
- Enthusiastic about engaging deeply with raw data.
- Committed to adhering to engineering best practices.
Responsibilities:
- Strong understanding of the significance of high-quality data for creating high-performance machine learning systems.
- Integrate novel, high-quality text data sources into established data pipelines.
- Build models dedicated to precise classification and extraction of valuable text from raw HTML.
- Develop a sophisticated OCR pipeline to extract pretraining text from images and scans, ensuring exceptional quality.
- Amass an extensive volume of multimodal data, exemplified by the collection of video transcripts spanning thousands of years.
- Devise innovative data generation pipelines that capitalize on existing data, such as the conversion of code from one programming language to another.
- Unify various annotation service providers into a user-friendly interface tailored for researchers.
Share this job
Oliver James