Our client is looking for an AWS Site Reliability Engineer to join their Platform Engineer group. You will join a diverse team dedicated to providing best-in-class data services to our customers, stakeholders, and partners. This is a contract to hire role so must be GC or US Citizen
As a Cloud Engineer within the AWS AI/ML platform team, you will have the opportunity to work one-on-one with application and infrastructure developers to build and enhance the AI/ML infrastructure and application patterns that power mission-critical applications, ensuring that they’re engineered for high availability, durability, and resiliency. You will be part of an agile team that combines various backgrounds, experiences, and perspectives to solve complex problems within AWS and beyond.
Responsibilities:
Focus on optimizing existing systems, building infrastructure, and eliminating work through automation.
Influence application and security architecture and design across multi and hybrid cloud platforms.
Peer-reviewing infrastructure-as-code (AWS CloudFormation, Python, Terraform, or similar).
Partnering with application and infrastructure teams to develop reusable cloud patterns.
Deployment and troubleshooting of infrastructure code.
Partner with the Site Reliability Engineering (SRE) team to conduct post-incident reviews and root cause analysis and building monitoring and automation to prevent future incidents.
Identify opportunities to build self-service capabilities and automate infrastructure and application deployments.
Develop tools and best practices for platform development, developer productivity, automation (MLOps, CI/CD, A/B testing), and production operations.
Design, Develop & deliver critical components, frameworks, services, and products using AWS SageMaker, Lambda, and container technologies in AWS.
Develop processes, model monitoring, and governance framework for successful ML model operationalization.
Define standards for engineering and operational excellence for running best-in-class ML platforms and continue to improve ML platforms to keep up with the latest innovations.
Assist in gathering and analyzing non-functional requirements and translating that into technical specifications for robust, scalable, supportable solutions that work well within the overall system architecture
Technical Qualifications:
The cloud is a rapidly changing world, with the major players announcing new features almost on a daily basis. A successful Cloud Engineer doesn’t need to know everything about everything but instead keeps a pulse on new developments and emerging paradigms to identify areas where they can continuously improve their skill sets.
Ability to debug, optimize code, and automate routine tasks.
A systematic problem-solving approach coupled with a strong sense of ownership and drive.
Ability to quickly pickup and understand where newly released cloud services would be appropriate for business applications.
Experience with infrastructure automation tools such as Puppet, Ansible, CloudFormation, or Terraform.
Working knowledge of pipeline-automation tools such as Jenkins, CodePipeline, Azure DevOps, or other comparable tools.
Experience using Git for source control management.
Ability to proficiently write code in Python, Node.js, Bash (shell), PowerShell, or other similar languages.
Experience using Docker within container orchestration platforms such as AWS ECS, EKS, Google Anthos, or others.
Comfortable in a Linux environment.
Understanding of foundational AWS services such as VPCs, EC2, S3, RDS, Auto Scaling Groups, CloudWatch Logs, etc.
In-depth knowledge of security and IAM within AWS, including the management and operation of Security Groups, KMS Keys, VPC NACLs, and SCPs.
Familiar with ETL and big data tool-chains such as those provided by Hadoop/EMR, Glue, Spark, Impala, or similar.
Understanding of relational database systems and how applications interact with them.
Familiarity with one or more log and event aggregation and monitoring systems such as Splunk, Elasticsearch (ELK), Prometheus, Grafana, or similar.
Qualifications:
5+ years’ experience in Amazon Web Services (AWS).
Experience in working in an Agile/Scrum-focused organization.
Strong verbal and written communication skills; comfortable with translating technical problems to non-technical audiences.
MS/BS degree in Information Technology, Computer Science, related technical field, or equivalent practical experience.
Preferred Qualifications:
One or more Associate or Professional-level AWS certificates.
Prior experience within a DevOps, DevSecOps, SRE, or UNIX/Linux Sys-Admin teams.