Role Overview:
As Principal Data Engineer, you will drive the architecture and technical direction for MontyCloud’s next-generation data and knowledge platforms, enabling intelligent automation, advanced analytics, and AI-driven products for a wide range of users.
You will play a pivotal role in shaping the data foundation for AI-driven systems, ensuring our platform is robust, scalable, and ready to support state-of-the-art AI workflows. You will also lead the efforts in maintaining stringent data security standards, safeguarding sensitive information throughout data pipelines and platforms.
Key Responsibilities:
- Architect and optimize scalable data platforms that support advanced analytics, AI/ML capabilities, and unified knowledge access.
- Lead the design and implementation of high-throughput data pipelines and data lakes for both batch and real-time workloads.
- Set technical standards for data modeling, data quality, metadata management, and lineage tracking, with a strong focus on AI-readiness.
- Design and implement secure, extensible data connectors and frameworks for integrating customer-provided data streams.
- Build robust systems for processing and contextualizing data, including reconstructing event timelines and enabling higher-order intelligence.
- Partner with data scientists, ML engineers, and cross-functional stakeholders to operationalize data for machine learning and AI-driven insights.
- Evaluate and adopt best-in-class tools from the modern AI data stack (e.g., feature stores, orchestration frameworks, vector databases, ML pipelines).
- Drive innovation and continuous improvement in data engineering practices, data governance, and automation.
- Provide mentorship and technical leadership to the broader engineering team.
- Champion security, compliance, and privacy best practices in multi-tenant, cloud-native environments.
Desired Skills
Must Have
- Deep expertise in cloud-native data engineering (AWS preferred), including large-scale data lakes, warehouses, and event-driven/data streaming architectures.
- Hands-on experience building and maintaining data pipelines with modern frameworks (e.g., Spark, Kafka, Airflow, dbt).
- Strong track record of enabling AI/ML workflows, including data preparation, feature engineering, and ML pipeline operationalization.
- Familiarity with modern AI/ML data stack components such as feature stores (e.g., Feast), vector databases (e.g., Pinecone, Weaviate), orchestration tools (e.g., Airflow, Prefect), and ML ops tools (e.g., MLflow, Tecton).
- Experience working with modern open table formats such as Apache Iceberg, Delta Lake, or Hudi for scalable data lake and lakehouse architectures.
- Experience implementing data privacy frameworks such as GDPR and supporting data anonymization for diverse use cases.
- Strong understanding of data privacy, RBAC, encryption, and compliance in multi-tenant platforms.
Good to Have
- Experience with metadata management, semantic layers, or knowledge of graph architectures.
- Exposure to SaaS and multi-cloud environments serving both internal and external consumers.
- Background in supporting AI Agents or AI-driven automation in production environments.
- Experience processing high-volume cloud infrastructure telemetry, including AWS CloudTrail, CloudWatch logs, and other event-driven data sources, to support real-time monitoring, anomaly detection, and operational analytics.
Experience
- 10+ years of experience in data engineering, distributed systems, or related fields.
Education
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (preferred).