Launch Recite Me assistive technology
Back to job search

Data Engineer

  • Location:

    San Francisco

  • Job type:

    Permanent

  • Salary:

    Negotiable

  • Contact:

    Rohan Yagnik

  • Contact email:

    Rohan.Yagnik@oliverjames.com

  • Job ref:

    JOB-112023-231352_1701100611

  • Published:

    over 1 year ago

  • Expiry date:

    2023-11-28

  • Startdate:

    ASAP

Location: San Francisco, CA (Hybrid)

Qualifications:

  • Master's degree or PhD in related field.
  • Proficient in Python.
  • Strong background in Software Engineering
  • Meticulous in preventing and catching data mistakes.
  • Enthusiastic about engaging deeply with raw data.
  • Committed to adhering to engineering best practices.

Responsibilities:

  • Strong understanding in the significance of high-quality data for creating high-performance machine learning systems.
  • Integrate novel, high-quality text data sources into established data pipelines.
  • Build models dedicated to precise classification and extraction of valuable text from raw HTML.
  • Develop a sophisticated OCR pipeline to extract pretraining text from images and scans, ensuring exceptional quality.
  • Amass an extensive volume of multimodal data, exemplified by the collection of video transcripts spanning thousands of years.
  • Devise innovative data generation pipelines that capitalize on existing data, such as the conversion of code from one programming language to another.
  • Unify various annotation service providers into a user-friendly interface tailored for researchers.

*No Third Party or Sponsorship at this time*

Banner Default Image

We are Oliver James

We received an average rating of 9.1 from feedback by our clients and candidates.

image.png