Invest in your career with a Madrona-funded company.

0
Companies
0
Jobs

Senior Data Engineer

SCALA.AI

SCALA.AI

Data Science
Remote
USD 100k-190k / year + Equity
Posted on Mar 11, 2026

Senior Data Engineer

$100,000 - $190,000 por año - Permanent, Full-time

Información del empleo

Sueldo

  • $100,000 - $190,000 por año

Tipo de empleo

  • Permanent
  • Full-time

Descripción completa del empleo

Senior Data Engineer

About SCALA.AI & The Role

SCALA.AI is a leading AI-native company building the next generation of intelligent, customer-centric solutions. We are a small, but highly efficient and high-performing team of engineers and innovators dedicated to delivering impactful products to our customers and members.

We operate at the bleeding edge of technology, leveraging the latest advancements in AI, machine learning, and modern cloud infrastructure to solve complex, real-world problems. Our culture thrives on ownership, continuous learning, and pushing the boundaries of what’s possible. If you’re excited by massive technical challenges, a fast-paced environment, and the opportunity to make an outsized impact, you’ll fit right in.

Responsibilities: What You'll Build & Own

As a Senior Data Engineer, you will be the core architect of our data infrastructure, responsible for building and optimizing the robust, high-throughput data pipelines that feed our advanced AI and machine learning models. You will ensure data quality, reliability, and security across the entire data lifecycle, enabling our researchers and engineers to innovate at the speed of the startup world. This role requires deep technical expertise, independence, and a passion for data excellence at scale.

  • Design and Build Data Pipelines: Architect, construct, and manage scalable ETL/ELT pipelines for data ingestion, processing, and transformation, ensuring high availability and fault tolerance.
  • Vector Infrastructure for RAG: Build and maintain the specialized data pipelines required for Retrieval-Augmented Generation (RAG). This includes automated document parsing, metadata extraction, and high-performance ingestion into Vector Databases (e.g., Pinecone, Weaviate, Milvus, or pgvector).
  • Live Data for MCP: Design and optimize "live" data access layers that support Model Context Protocol (MCP). You will ensure that AI agents have low-latency, secure access to structured enterprise data and real-time APIs for "agentic" tool-calling and decision making.
  • Real-Time Streaming: Implement and manage real-time data streaming architectures (e.g., Kafka, Kinesis, or Flink) to ensure our AI models are grounded in the most current data available, moving beyond static knowledge bases.
  • Optimize Data Architecture: Drive the technical vision for our data warehousing, data lakes, and data streaming platforms, optimizing infrastructure for performance and cost-efficiency on AWS.
  • Data Quality & Governance: Implement rigorous data validation, monitoring, and testing frameworks to ensure the accuracy, completeness, and consistency of data used by AI models and business applications.
  • Collaboration: Work closely with AI/ML Engineers to bridge the gap between raw data sources and model-ready context.

Required Qualifications

We seek a seasoned Data Engineer with a deep command of modern cloud-native data architectures.

  • Bachelor's degree in Computer Science, Engineering, or a related quantitative field.
  • 7+ years of experience as a Data Engineer, focused on building and scaling production data systems.
  • Expert proficiency in at least one backend language highly used in data engineering, such as Python or Scala.
  • Direct experience with Vector Databases and the data engineering challenges unique to RAG (e.g., managing embeddings, indexing strategies, and hybrid search).
  • Experience building/maintaining APIs or data services that interface with LLM "tools" or agentic frameworks via protocols like MCP or JSON-RPC.
  • Proven, hands-on experience building large-scale data solutions on AWS, utilizing services like S3, Redshift, Kinesis/MSK, Glue, and Lambda.
  • Extensive experience with modern data orchestration tools (e.g., Airflow, Prefect, or Dagster).
  • Deep expertise in SQL and working with large-scale relational and NoSQL databases (e.g., PostgreSQL, DynamoDB).

Desired Attributes

  • You thrive in an early-stage startup environment and can run fast with a small development team, demonstrating a strong bias for action and execution.
  • Experience in MLOps data pipelines, including feature store management and providing data infrastructure tailored for training and inference of Large Language Models (LLMs).
  • Familiarity with containerization technologies (Docker, Kubernetes) for deploying data services.
  • A track record of high independence and excellent communication skills, capable of driving projects and clearly articulating data architecture decisions.

Salary & Benefits

  • Competitive base salary, depending on experience and location
  • Plus, annual equity awards

Why SCALA.AI

We’re redefining how businesses use AI — with a team that’s fast, fearless, and focused. You’ll play a key role in driving growth across industries and shaping how customers adopt intelligent, agentic technology.

Join us and help write the story of how AI transforms work.

Pay: $100,000.00 - $190,000.00 per year

Work Location: Remote

Si necesitas métodos alternativos de postulación o evaluación, debes acercarte a la empresa directamente para solicitarlo ya que Indeed no es responsable del proceso de postulación de la empresa.