Site Reliability Engineer Job at Omniscius Consulting, United States

V2lZdDlydS94NDlDcUhMR1c4TTBkaVpLRWc9PQ==
  • Omniscius Consulting
  • United States

Job Description

Our client is seeking a Site Reliability Engineer (SRE) that will be responsible for ensuring the reliability, performance, and scalability of the software, websites, and applications. This role requires a combination of software engineering and systems administration skills to monitor, control, and automate systems. The ideal candidate will have a deep understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance. This position plays a critical role in maintaining the overall health and efficiency of our platform.

Key Responsibilities:

System Monitoring and Maintenance:
‍- Monitor the performance and reliability of Kubernetes clusters, software, websites, and applications.
- Automate routine maintenance tasks to ensure system stability and performance.

Incident Response and Troubleshooting:
- Respond to and resolve incidents in a timely manner, minimizing downtime and impact on users.
- Conduct root cause analysis to identify and address underlying issues.
- Develop and implement strategies to prevent future incidents and improve system resilience.

Automation and Infrastructure Management:
‍- Design, build, and maintain automated systems and processes to improve efficiency and reduce manual intervention.
- Manage cloud infrastructure, including provisioning, scaling, and optimizing resources.
- Collaborate with development teams to ensure seamless deployment and integration of new features and updates.

Performance Optimization:
‍- Analyze system performance and identify areas for improvement.
- Implement performance tuning and optimization techniques to enhance system efficiency.
- Collaborate with cross-functional teams to ensure optimal performance of all components.

Security and Compliance:
‍- Ensure compliance with security best practices and industry standards.
- Implement and maintain security measures to protect systems and data.
- Conduct regular security audits and vulnerability assessments.

Documentation and Reporting:
‍- Maintain accurate and up-to-date documentation of systems, processes, and procedures.
- Generate and analyze reports on system performance, incidents, and other key metrics.
- Provide regular updates to management and stakeholders on system health and performance.

Continuous Improvement:
- Identify opportunities for improving system reliability, performance, and scalability.
- Stay up-to-date with industry trends and best practices in site reliability engineering.
- Participate in training and development opportunities to enhance skills and knowledge.

Qualifications:
- Deep expertise of Kubernetes and containers.
- Strong understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance.
- Experience with monitoring and logging tools such as Loki, Grafana.
- Minimum of 3 years of experience in site reliability engineering, Kubernetes administration, or a related role.
- Excellent problem-solving skills and attention to detail.
- Strong communication and interpersonal skills, with the ability to work effectively with cross-functional teams.

Job Tags

Full time,

Similar Jobs

Parker Loss Consultants

Independent Claims Adjuster Job at Parker Loss Consultants

 ...Parker Loss Consultants is Hiring Field Adjusters Nationwide! Parker Loss Consultants is a national provider of specialty claims management...  ...coverage, setting accurate loss reserve estimates, control the insured's exposures and losses, and managing contractors such as... 

Crescent Hotels & Resorts

People & Culture Director Job at Crescent Hotels & Resorts

 ...with minimum 3 years at an HRD level in the hospitality or related industry. ~ Bachelor's degree in Human Resources, Business Administration, or a related field (Masters preferred). ~ Proven advanced knowledge of employment laws. ~ Proven experience with UKG/... 

All Care Therapies

1099 Pediatric Speech Language Pathologist (SLP) - Home Health Job at All Care Therapies

 ...collaborative team environment. Job Description All Care Therapies is currently seeking a Speech Language Pathologist (SLP) to join our team in a Home Health setting ! This is an exciting opportunity to join an evolving therapy team, now expanding to the... 

NYC Homeless Healthcare Fellowship

Primary care physician - homeless healthcare fellow Job at NYC Homeless Healthcare Fellowship

 ...Deadline for applications extended to March 24, 2025. The NYC Homeless Healthcare Fellowship This 12-month training program will...  ...alongside experienced homeless healthcare providers delivering shelter-based care, mobile health care, and/or street medicine at their... 

Aughdem Recruitment

Director of Communications Job at Aughdem Recruitment

 ...across various industries. With a strong reputation for strategic investment and industry leadership, they are seeking a Director of Communications to manage and elevate their brand presence. This role reports directly to the Managing Director and will play a crucial role...