Company:
Kuraray America, Inc.
Location: San Jose
Closing Date: 06/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description
Qualifications
Location: San Francisco, Seattle, LA or PST
Enterprise Data Platform to enable timely, effective and safe sharing of data to multiple engineering, operations and business teams for building world-class data products.
Responsibilities
- Build data ingestion and processing pipelines to enable data analytics and data science use-cases in areas of digital commerce, service operations, charging, reliability, finance, capex, warranty, customer service, and others.
- Build a modular set of data services using Python, SQL, AWS Glue, Lambda, API Gateway, Kafka, data build tool (dbt), Apache Spark on EMR, among others.
- Build automated unit and integration testing pipelines using frameworks like PySpark.
- Create and manage CICD pipelines with GitLab CI and AWS Code Pipeline/CodeDeploy.
- Automate and schedule jobs using Managed Airflow.
- Build the ODS and reporting schemas and load the data into AWS Redshift or Snowflake.
- Design and build data quality management services with Apache Deequ and data observability tools like Splunk, DataDog, CloudWatch.
- Provide a variety of query services with REST, Athena/Presto, server-sent events.
- Configure and set up the enterprise data lineage and metadata management and data catalog support using tools like Collibra/Alation.
- Assist the data scientist within the data engineering team as well as other software engineering teams with data cleansing, wrangling, and feature engineering.
- Ensure green builds for deployment and work with program management and senior leads to burn down planned deliverables in a sprint cycle.
Qualifications
- At least 5+ years building data and analytics platforms using AWS Cloud, Python, and SQL.
- Knowledge of AWS technologies specifically MSK, EMR, Athena, Glue, Lambda, API Gateway as well as Python, SQL is a must.
- Knowledge of modern data tools like dbt (data build tool) and Airflow orchestration is highly desired.
- Ability to assist SQL analysts and Tableau developers in business teams in creating the right set of materialized views in a SQL data warehouse like Redshift/Snowflake.
- Knowledge of automation and CICD best practices.
- Familiarity with machine learning and data science ecosystems especially AWS Sagemaker and Databricks is highly preferred.
- Hands-on experience in building and maintaining production data applications, current experience in both relational and distributed columnar data stores.
- Deep experience using SQL, Python, and Spark. Hands-on experience with Big-data technologies (e.g. Redshift, Athena, Glue, EMR, Kinesis, Step Function, or equivalent in other web services).
- Familiarity with timeseries databases, data streaming applications, Kafka, Flink, and more is a plus.
- Familiarity with modern data science and product analytics tools and techniques such as R, Machine Learning, and advanced statistics is a plus.
Share this job
Kuraray America, Inc.