- Experiência
- 5+ yrs
- Salário
- —
- Vagas
- 1
- Publicado
- há 4 horas
- Work mode
- Trabalhe em casa
- Educação
- Bachelor's degree in Computer Science or related field
- Eligibility
- Experienced professionals with a background in observability, SRE, or platform engineering. Candidates should be comfortable working with distributed systems, production monitoring, and cross-functional teams. Fluent English is required. A bachelor’s degree in Computer Science or a related field is…
- Resume
- Required to apply
Descrição da vaga
About the Company
Rimini Street, Inc. is a global provider of enterprise software support, managed services, and AI-driven ERP solutions. The company is recognized for third-party support across major enterprise software ecosystems and works with organizations ranging from large global enterprises to public sector and government clients.
Rimini Street India GCC began operations in Hyderabad in 2013 and expanded into Bengaluru, building capabilities across client onboarding, IT shared services, service development, product engineering, managed services, professional services, and security services. The India team now includes more than 800 full-time professionals and plays an important role in the company’s broader global delivery organization.
Role Overview
The Observability Specialist will design and strengthen the monitoring, tracing, and logging stack for Rimini Street’s Agentic AI ERP platform. This role is focused on giving engineers, operators, and customers clear insight into how the platform behaves so that issues can be identified quickly, performance can be improved, and production reliability can be maintained.
Reporting to the Sr Director Engineering – India, this position is central to production operations and customer confidence. The role will establish an OpenTelemetry-based observability layer, develop dashboards and alerting, and support reliable AI agent execution in enterprise ERP environments.
Responsibilities
- Create and evolve the observability architecture with OpenTelemetry as the core standard.
- Set up and maintain distributed tracing using Jaeger or equivalent tooling.
- Build metric collection and retention systems with Prometheus, Thanos, or Cortex.
- Implement centralized logging with structured formats and efficient search/query capabilities.
- Ensure observability tooling can expand with platform usage and customer growth.
- Establish instrumentation guidelines for Java (Quarkus), Python, and Angular applications.
- Add automatic and manual instrumentation for MCP servers and agent-based workflows.
- Maintain trace context propagation across services and ERP calls.
- Develop custom measurements for AI agent behavior, LLM performance, and RAG retrieval quality.
- Connect observability data to CI/CD pipelines for deployment visibility.
- Create Grafana dashboards covering health, performance, and business outcomes.
- Develop external/customer dashboards for agent activity and processing progress.
- Design SLI/SLO views and error-budget tracking dashboards.
- Support trace visualization for troubleshooting complex agent flows.
- Prepare documentation and training content for dashboard users.
- Design alert rules that improve signal quality while minimizing noise.
- Implement tiered alerting with clear escalation routes.
- Build runbooks that connect alerts to diagnosis and remediation steps.
- Assist incident response activities with observability expertise and root-cause analysis.
Requirements
- At least 5 years of experience in observability, SRE, or platform engineering.
- Hands-on experience applying observability to distributed systems or microservices.
- Proven experience creating dashboards and alerting for production environments.
- Practical knowledge of OpenTelemetry or comparable instrumentation frameworks.
- Strong command of OpenTelemetry across traces, metrics, and logs.
- Experience with Prometheus, Grafana, and alerting tools.
- Familiarity with distributed tracing tools such as Jaeger or Zipkin.
- Experience with log aggregation tools such as ELK or Loki.
- Programming ability in Python, Java, or Go for tooling and instrumentation.
- Preferred exposure to Kubernetes observability and service mesh telemetry.
- Preferred background in AI/ML observability, including LLM monitoring.
- Preferred familiarity with Quarkus and Python instrumentation.
- Preferred experience with SLI/SLO practices and error budgets.
- Preferred knowledge of PromQL, LogQL, and TraceQL.
- Strong analytical and troubleshooting ability.
- Excellent written and spoken English communication skills.
- Ability to collaborate across distributed, multi-timezone teams.
- Data-oriented mindset focused on actionable insights.
- Team-oriented approach with development and operations stakeholders.
- Bachelor’s degree in Computer Science or a related discipline preferred.
- Certifications in Grafana, Prometheus, or observability platforms preferred.
- CKA or cloud platform certifications preferred.
- Fluent English is required.
Perks and Benefits
- Compensation, bonuses, and benefits aligned to high-performing talent.
- Opportunity to work on meaningful, high-impact production systems.
- Exposure to an international, collaborative environment.
- Chance to contribute to a company with global scale and strong industry recognition.
- Diverse and inclusive workplace with equal employment opportunity commitment.
Additional Information
This is a remote full-time role based in India, Hyderabad. The company values innovation, collaboration, strong client focus, and community impact. Rimini Street operates with a global footprint and emphasizes professional excellence, continuous learning, and measurable business outcomes. Unsolicited resumes from staffing or recruiting firms are not accepted unless specifically requested by Human Resources.
About the Company Values
Rimini Street describes its culture through four core values: Company, Colleagues, Clients, and Community. The organization highlights bold innovation, respectful teamwork, exceptional client service, and social responsibility through the Rimini Street Foundation.