Company Profile:
The Carlyle Group (NASDAQ: CG) is a global investment firm with $425 billion of assets under management and more than half of the AUM managed by women, across 595 investment vehicles as of March 31, 2024. Founded in 1987 in Washington, DC, Carlyle has grown into one of the world's largest and most successful investment firms, with more than 2,200 professionals operating in 28 offices in North America, Europe, the Middle East, Asia and Australia. Carlyle places an emphasis on development, retention and inclusion as supported by our internal processes and seven Employee Resource Groups (ERGs). Carlyle's purpose is to invest wisely and create value on behalf of its investors, which range from public and private pension funds to wealthy individuals and families to sovereign wealth funds, unions and corporations. Carlyle invests across three segments - Global Private Equity, Global Credit and Investment Solutions - and has expertise in various industries, including: aerospace, defense & government services, consumer & retail, energy, financial services, healthcare, industrial, real estate, technology & business services, telecommunications & media and transportation.
At Carlyle, we know that diverse teams perform better, so we seek to create a community where we continually exchange insights, embrace different perspectives and leverage diversity as a competitive advantage. That is why we are committed to growing and cultivating teams that include people with a variety of perspectives, people who provide unique lenses through which to view potential deals, support and run our business.
Position Summary:
As a Senior Data Engineer at Carlyle, you will join an innovative team that leverages data as the driving force of our cutting-edge solutions. You will design, build and maintain the data infrastructure and pipelines that power our data-driven products and insights. You will work with tools like Snowflake, Spark, Kafka, Airflow and cloud data platforms to create robust data architectures and enable scalable and efficient data collection, storage, processing and analysis.
You will also ensure data quality, security and governance by implementing data validation processes, access controls and monitoring systems. You will collaborate with data consumers like analysts, data scientists and engineers to understand their requirements and deliver trusted data products.
You should have substantial experience in distributed systems, data modeling, pipeline orchestration and programming languages like Python/Scala. You should also have strong problem-solving abilities and excellent communication skills. If you are passionate about building scalable data architectures and turning raw data into analytical insights, join our team and help drive our data-centric products and strategy.
Primary Responsibilities:
- Design, implement, and support cloud data platforms such as Snowflake and Databricks, and leverage their features and capabilities to optimize data performance and scalability.
- Architect and administer data lakes and cloud data warehouses that provide secure, reliable, and flexible data storage and access for analytics and machine learning.
- Design, build, and maintain scalable and robust data pipelines using various cloud services and tools such as AWS, Azure, SnapLogic, Apache Airflow, and Prefect.
- Develop and optimize data processing workflows with Python, Scala, and Spark, and manage and support data warehouse and data lake solutions in Snowflake, Amazon Aurora, and other platforms.
- Utilize Git, GitHub, and Azure DevOps for version control and collaboration, and apply Terraform and Infrastructure as Code (IaC) principles to automate and manage infrastructure.
- Champion the implementation of CI/CD pipelines to streamline development and deployment processes.
- Ensure data integrity and compliance with best practices in SQL and NoSQL database systems, and troubleshoot issues with data quality, security, and privacy.
- Continuously explore new technologies to enhance data reliability, efficiency, and quality.
- Collaborate with data consumers like analysts, data scientists, engineers, and product managers to understand their requirements and deliver trusted data products.
- Create and maintain data documentation, metadata, and data dictionaries to ensure data accessibility and usability.
- Perform data testing and validation to ensure data accuracy and consistency.
- Provide data engineering support and guidance to junior data engineers and other data team members.
- Stay updated with the latest trends and developments in data engineering and related fields.
- Apply best practices and standards for data governance, security, and quality across cloud data platforms, and ensure compliance with data policies and regulations.
- Evaluate and select appropriate data tools, frameworks, and technologies to meet the data engineering needs of the organization.
- Design and implement data APIs and services to enable data consumption and integration across different systems and applications.
- Monitor and improve data pipeline performance, efficiency, and reliability, and troubleshoot any data issues or failures.
- Conduct data analysis and provide insights and recommendations to support data-driven decision making.
- Implement and integrate machine learning models using AWS Sagemaker, MLFlow, and Jupyter Notebooks into production systems.
- Mentor and coach other data team members on data engineering best practices, standards, and methodologies.
Requirements:
Education & Certificates
- Bachelor's degree in Computer Science, Engineering, or related field
- Relevant certifications in AWS, Azure, and other modern data technologies are highly desirable.
Professional Experience
- 5+ years of relevant experience in data engineering, data analysis, and data pipeline development.
- Proficient in AWS data services, such as S3, Glue, Redshift, EMR, Athena, and Kinesis, and able to design, build, and optimize scalable and reliable data pipelines using AWS tools and best practices.
Competencies & Attributes
- Proficient in Snowflake cloud data platform, and able to leverage its features and capabilities for data ingestion, storage, processing, and analysis.
- Proficient in Databricks unified data analytics platform, and able to use its collaborative notebooks, integrated APIs, and optimized clusters for data engineering and machine learning.
- Proficient in Azure data services, such as Blob Storage, Data Factory, Synapse Analytics, Databricks, and HDInsight, and able to design, build, and optimize scalable and reliable data pipelines using Azure tools and best practices.
- Expert skills in SQL, Python, Scala, and Spark for data extraction, transformation, and loading (ETL).
- Experience with AWS Sagemaker, MLFlow, Jupyter Notebooks, and other machine learning frameworks and applications.
- Experience with Git, GitHub, Azure DevOps, Terraform, and CI/CD practices for data pipeline automation and deployment.
- Knowledge of data warehouse, data lake, and data mart concepts and architectures.
- Experience with pipeline orchestration tools like Apache Airflow, Prefect, or Luigi.
- Ability to design, optimize, monitor, and troubleshoot data pipelines for performance, reliability, and quality.
- Proficient in SnapLogic, able to leverage rich set of connectors to build scalable and robust data pipelines for various use cases.
- Demonstrated experience in using SnapLogic's features such as pipelines, tasks, snaps, patterns, and ultra tasks to design, develop, test, and deploy data solutions.
- Experience with data governance, data security, data validation, and error handling best practices.
- Alation data catalog experience a plus
- Expertise in Alation a plus, able to use its features such as data search, data lineage, data quality, and data stewardship to enable data governance and discovery across various data sources and platforms.
- Understanding of data modeling, data mining, and data analysis techniques and methods.
- Familiarity with other industry-standard data tools like Kafka, Hive, Redis, MongoDB, etc.
- Excellent communication and collaboration skills, and ability to mentor and coach other data team members.
Due to the high volume of candidates, please be advised that only candidates selected to interview will be contacted by The Carlyle Group.
#J-18808-Ljbffr