Lead Cloud Data Engineer (Massachusetts)
Lead Data Engineer
Rapid7 (Nasdaq: RPD) is advancing security with visibility, analytics, and automation delivered through our Insight cloud. Our solutions simplify the complex, allowing security teams to work more effectively with IT and development to reduce vulnerabilities, monitor for malicious behavior, investigate and shut down attacks, and automate routine tasks. Over 9,300 customers rely on Rapid7 technology, services, and research to improve security outcomes and securely advance their organization. For more information, visit our website, check out our blog, or follow us on LinkedIn.
The Opportunity
Rapid7 seeks a Lead Cloud Data Engineer to build and maintain data infrastructure within the Data Engineering team's data platform. You will be responsible for deploying data pipelines and machine learning models in the cloud, implementing DevOps practices and developing data models within Snowflake. You will work closely with AWS, Snowflake and DevOps applications such as Github Actions, Terraform, AWS CodePipeline & CodeDeploy, dbt and more. In this role you will bridge the gap between a data engineer and a DevOps engineer by building monitoring systems, cloud applications and CI/CD pipelines that support the data engineering team's efforts.
The ideal candidate has hands-on experience performing DevOps work in a cloud environment, and has worked closely with databases and data pipelines. It's critical that you are able to translate business objectives into data required to support key analyses. You will collaborate with a creative, analytical and data-driven team to bring a single source of truth and self-service analytics to the entire company.
In this role you will:
- Transform Rapid7's data platform by applying DevOps best practices of automation, monitoring, CI/CD and configuration management
- Build and maintain the applications that ingest, analyze and store Rapid7's enterprise data
- Mentor and provide guidance to peer data engineers based on your experiences and technical expertise
- Productionize data and machine learning pipelines with containerization
- Develop effective monitoring and alerting systems to provide real-time visibility into the health of data infrastructure, cloud applications and data/machine learning pipelines
- Build an environment that enables data scientists to easily develop and productionize Python, R and Spark code on top of a Snowflake data warehouse
- Automate existing code and processes using scripting, CI/CD, infrastructure-as-code and configuration management tools
- Perform data engineering projects within Snowflake such as developing data pipelines, data models and metadata management solutions
- Collaborate with stakeholders in product, business and IT to deliver data products
- Work closely with leadership to drive adoption of the latest DevOps and DataOps trends and technologies.
- Partner with the IT, Infrastructure and engineering teams on integration efforts between systems that impact data & Analytics
In return you will bring:
- 4+ years of experience with a major cloud provider (preferably AWS); Experience building/maintaining VPC’s and deploying code using Terraform (IaC) is a must!
- 6+ years in a hands-on data engineering role performing data pipelining, infrastructure, integration and/or technical development of data architecture
- 6+ years of experience in at least one programming such as Python, Java, Scala etc. as well as advanced hands-on SQL work
- 3+ years of experience working with a modern cloud data warehouse (preferably Snowflake) and orchestration tools (preferably Airflow)
- Hands on experience working with the “modern data stack” (dbt, Fivetran, Stitch, etc)
- Experience as a leader within a data engineering team and ability to mentor teammates
- Excellent written and verbal communication skills
- Expertise in data architecture, data warehousing, and metadata management
- Extensive experience with a CI/CD tool such as Github Actions and AWS Code Pipeline
- BS or MS in Computer Science, Analytics, Statistics, Informatics, Information Systems or another quantitative field. or equivalent experience and certifications will be considered
- Data Governance tools, data profiling and process improvement experience, a plus
R2979