Introduce

Hello,I'm Espira Building Tomorrow's Infrastructure Today | Specializing in DevOps, Platform and Site Reliability Solutions

I'm passionate about creating and optimizing the technological foundations that businesses rely on, turning complex infrastructure challenges into elegant, efficient solutions that scale seamlessly and perform reliably

Rounded Text

8+

Years of
Experience

30+

projects completed on
different technologies

About

Every great system scales with purpose,
let's build yourfuture

Drawing on eight years of experience in large-scale infrastructure and reliability engineering, I've mastered the art of building and maintaining infrastructure at scale. From orchestrating distributed cloud architectures to deploying AI infrastructure and GPU clusters, I combine technical precision with innovative solutions to drive operational excellence.

Resume

Education & Experience

SRE Engineer, Data Streaming Team

Sport Server

  • Led SRE efforts within the Data Streaming team, managing data systems for optimal reliability and performance
  • Scaled infrastructure in BigQuery, reducing processing times and improving query performance
  • Developed cost management strategies, reducing operational costs by 30%
  • Built comprehensive observability framework with GCP tools

Research Software Engineer, Infrastructure

Eco Health Alliance

  • Architected serverless infrastructure on AWS, enhancing scalability and reducing costs by 30%.
  • Implemented robust systems for data collection and warehousing tailored to the needs of field researchers.
  • Managed high-performance computing environments with Docker and orchestration improving system reliability during peak periods and large workload.
  • Led the application of Large Language Models (LLMs) on GPU setups, boosting machine learning capabilities.
  • Automated infrastructure deployment processes, enhancing operational transparency and reducing workload by 40%.

SRE Engineer

Sport Server

  • Managed scale challenges unique to sports servers on GCP, maintaining 99.9% uptime and 99.95% data integrity.
  • Enhanced service lifecycle from design to refinement, ensuring robust and scalable architecture.
  • Integrated Google Cloud tools to enhance data observability and real-time quality checks.

DevOps Tech Lead

Sarami · Contract · Nairobi, Kenya

  • Developed software release management strategies and migrated applications from Heroku to AWS.
  • Implemented autoscaling, load balancing, and managed DNS with AWS Route 53.
  • Increased automation using GitHub Actions and Jenkins, and managed deployments to Kubernetes via GitOps.
  • Built secure systems on AWS, utilizing services like Amplify, EC2, S3, ELB, and CloudFormation.
  • Used AWS SDK in Python to interact with AWS services, reducing configuration errors.
  • Improved application performance with microservices architecture.
  • Enhanced system observability and monitored services using APM tools.

System Engineer

ICIPE - International Centre of Insect Physiology and Ecology

  • Upgraded AWS-based computing platforms for enhanced scalability and efficiency in scientific computations.
  • Streamlined big data analysis and deployment processes using Slurm and Terraform.
  • Developed and maintained genomic data analysis pipelines, improving data reusability and research efficiency.
  • Collaborated with technology vendors to minimize system failures and enhance infrastructure reliability.
  • Implemented performance monitoring solutions with Grafana, crucial for maintaining system health and uptime.

Data Systems Consultant

ICIPE - International Centre of Insect Physiology and Ecology

  • Optimized data processes for scalability, enhancing the efficiency of data operations.
  • Developed analytical tools to generate actionable insights, supporting strategic decision-making.
  • Integrated and built data analytics platforms tailored for entomological research.
  • Designed machine learning models for real-time monitoring of insect populations, enhancing research accuracy.

DevOps Engineer

ICIPE - International Centre of Insect Physiology and Ecology

  • Architected and deployed comprehensive CI/CD pipelines using Jenkins and CircleCI, enhancing real-time monitoring and cloud integration.
  • Developed and managed Docker and Singularity containers, orchestrating deployments across Kubernetes clusters to ensure scalable infrastructure.
  • Administered high-performance computing (HPC) environments, supporting complex data processing tasks.
  • Optimized batch queuing systems in a massively parallel production setting, enhancing processing efficiency and throughput.
  • Conducted system utilization analysis to ensure over 90% uptime and robust system health.
  • Led infrastructure redesign initiatives, automating core processes and enhancing scalability, thereby reducing operational overhead.
  • Constructed sophisticated SQL-based infrastructure for data extraction and loading, developing analytical tools to transform data insights into actionable intelligence.
  • Collaborated closely with executive, product, data, and design teams to support infrastructure needs and address technical challenges.
  • Implemented and continually refined data management procedures, enhancing data system functionality to support ongoing research and operational needs.

Education & Certifications

Education & Credentials

2013 - 2018

BSc in Computer Technology

Jomo Kenyatta University of Agriculture and Technology

Professional Certifications

View my verified achievements on

Credly Badges

This link leads to the Credly platform where I have listed all my professional certifications and badges that showcase my specialized skills.

Services

My Specializations

Cloud Computing

Proficient in deploying and managing scalable cloud solutions across AWS and GCP, my expertise encompasses a wide range of cloud technologies. I focus on aligning with the pillars of the Well-Architected Framework to ensure operational excellence, security, reliability, performance efficiency, and cost optimization in diverse cloud environments.

Multiple Projects

Kubernetes & Containerization

Proficient in advanced container orchestration techniques with Kubernetes, enhancing cloud-native applications' scalability and resilience.

Multiple Projects

Observability & Monitoring

Specialized in designing high-performance data observability frameworks, I focus on leveraging cutting-edge technologies for comprehensive system monitoring, ensuring enhanced visibility and operational insights across various cloud environments

Multiple Projects

Big Data & Data Engineering

Specialized in big data technologies and data engineering, optimizing data processes for scalability and efficiency across distributed systems.

Multiple Projects

CI/CD Pipelines

Skilled in enhancing software development and delivery through the automation of deployments with CI/CD pipelines, ensuring streamlined and efficient processes across development environments. I incorporate rigorous security checks to maintain high standards of software integrity and security

Multiple Projects

Large Scale Distributed Computing

Experienced in managing complex, large-scale distributed computing environments to support intensive data processing and analysis tasks.

Multiple Projects

Machine Learning & AI

Skilled and experienced in orchestrating the deployment and integration of Large Language Models (LLMs) and machine learning solutions within GPU-accelerated environments, enhancing performance and driving innovative data processing capabilities

Multiple Projects

Infrastructure as Code

Skilled and experienced in applying Infrastructure as Code (IaC) methodologies across diverse platforms, from cloud environments to bare metal clusters. My approach automates the provisioning and management of resources, ensuring consistency, scalability, and compliance across all operational environments.

Multiple Projects

My Skills

My Technical Proficiencies

GPU Computing

GPU Computing & AI Infrastructure

  • GPU Resource Management, CUDA, Performance Optimization
  • Kubernetes GPU Operator, Resource Scheduling
  • Machine Learning Infrastructure, Model Deployment
Programming Languages

Programming Languages

  • Python, Java, Golang, JavaScript, R, Rust
Containerization and Orchestration

Containerization & Orchestration

  • Docker, Singularity, Kubernetes, Docker Swarm
Infrastructure as Code

Infrastructure as Code

  • Terraform, Ansible, Helm
Cloud Infrastructure

Cloud & Infrastructure

  • AWS, GCP, Kubernetes, Terraform
  • Infrastructure as Code, GitOps
  • High Availability Architecture
CI/CD Tools

CI/CD Tools

  • Jenkins, CircleCI, GitHub Actions
Monitoring and Logging

Monitoring and Logging

  • Grafana, Prometheus, ELK Stack, Open Telemetry, Honeycomb
Distributed Systems

Distributed Systems

  • Distributed Computing, Scalable Architecture
  • Message Queues, Event-Driven Systems
  • Consensus Protocols, Eventual Consistency
Database Management

Database Management

  • PostgreSQL, MongoDB, MySQL, Cassandra
Server and Version Control Management

Server Management & Version Control

  • Linux, Windows, GitHub, GitLab
Project Management Tools

Project Management Tools

  • Asana, Jira, Confluence, Opsgenie

Portfolio

Featured Projects

Scalable Distributed Key-Value Store

Implemented a highly available distributed key-value store with eventual consistency, featuring consistent hashing, replication, vector clocks, and gossip protocols. The system ensures data durability and handles node failures gracefully while maintaining uptime.

Advanced Data Observability

Developed comprehensive observability solutions using Google Cloud Platform (GCP) tools such as BigQuery, Cloud Logging, and Stackdriver. These solutions enabled real-time data analytics, log management, and monitoring to ensure system reliability and performance. Implemented automated alerting and dashboards using Grafana to provide actionable insights and improve operational efficiency.

Tools Used: GCP, BigQuery, Cloud Logging, Stackdriver, Grafana, Python, Java.

Healthcare Service Clinical Workflow System

Developed a comprehensive healthcare service management system that streamlines the entire patient care workflow from referral to payment completion. This system ensures regulatory compliance while optimizing service delivery efficiency.

  • Automated critical processes including referral management, insurance verification, and authorization tracking
  • Implemented real-time compliance monitoring with automated checkpoints
  • Built intelligent alert systems for deadline management and documentation requirements
  • Created detailed audit trails for regulatory compliance
  • Integrated billing and claims management with automated verification

Technologies: React, Node.js, PostgreSQL, Redis, Docker, AWS Services

Cross-platform Store Management System

Engineered a comprehensive store management application that streamlines daily operations and financial tracking. This cross-platform solution provides robust business process automation with a focus on user-friendly operation.

  • Built with Electron and React for seamless cross-platform compatibility
  • Implemented sophisticated state management using Redux for reliable data handling
  • Designed a normalized SQLite database schema for efficient data organization
  • Created automated systems for tax calculations and financial reconciliation
  • Developed robust backup systems and data export capabilities (CSV/PDF)

Technologies: Electron, React, Redux, SQLite, Node.js, Material-UI

Microservices Monitoring with OpenTelemetry

Implemented OpenTelemetry for distributed tracing and monitoring across a microservices architecture. Utilized application performance management (APM) tools to collect and analyze telemetry data, providing deep insights into system performance and enhancing reliability. Integrated with existing monitoring solutions like Prometheus and Grafana to visualize and manage service health.

Tools Used: OpenTelemetry, Prometheus, Grafana, Python, Docker, Kubernetes.

NLP and Large Language Model Deployment

Focused on leveraging large language models (LLMs) and natural language processing (NLP) techniques to extract and structure data from unstructured text sources. Conducted intensive model training on localized GPU setups to enhance processing efficiency and accuracy. Developed pipelines for fine-tuning models to specific datasets and tasks, improving their applicability in real-world scenarios.

Tools Used: Python, TensorFlow, PyTorch, GPU Computing, Docker, Kubernetes.

Enterprise Cloud Migration

Managed a major cloud migration project to Amazon Web Services (AWS), utilizing Infrastructure as Code (IaC) tools such as Terraform to ensure a smooth and efficient transition. Implemented automated deployment pipelines using CI/CD tools like Jenkins and GitHub Actions. Enhanced operational efficiency and scalability by leveraging AWS services such as EC2, S3, and RDS.

Tools Used: AWS, Terraform, Jenkins, GitHub Actions, Python, Docker.

High-Performance Computing Setup

Designed and optimized a Kubernetes-based High-Performance Computing (HPC) environment for complex data analysis tasks. Managed container orchestration using Docker and Kubernetes to ensure efficient resource utilization and scalability. Integrated with parallel computing frameworks and batch scheduling systems for optimized performance.

Tools Used: Kubernetes, Docker, Slurm, Terraform, Python, R, Ansible.

Data Security Enhancement Initiative

Led the development and implementation of advanced security measures to protect sensitive data. Utilized AWS security services, encryption techniques, and continuous monitoring to ensure data integrity and compliance with industry standards. Implemented automated security checks and incident response protocols to enhance overall security posture.

Tools Used: AWS Security Services, Python, Terraform, Jenkins, ELK Stack.

Prompt-Engineering for Open-Source LLMs

Developed prompt-engineering techniques for enhancing the performance of open-source large language models (LLMs). Implemented and fine-tuned prompts to improve the accuracy and applicability of LLMs in various use cases, such as text summarization, sentiment analysis, and information extraction.

Tools Used: Python, TensorFlow, PyTorch, OpenAI GPT, Docker.

Dockerized Jupyter Notebooks Setup

Created a Dockerized environment for running Jupyter notebooks, enabling easy sharing and consistent setups across teams. Utilized Docker Compose for managing multiple containerized environments, simplifying the deployment and maintenance of Jupyter notebooks for data science and research projects.

Tools Used: Docker, Docker Compose, Jupyter, dockerswarm, EKS, Python.

AWS ParallelCluster Setup

Implemented AWS ParallelCluster for managing High-Performance Computing (HPC) clusters, supporting various instance types and job schedulers like AWS Batch and Slurm. Developed and maintained HPC infrastructure to enable scalable and efficient computational workflows for research and data-intensive applications.

Tools Used: AWS ParallelCluster, AWS Batch, Slurm, Python, Terraform.

Serverless Architecture Migration

Migrated applications to a serverless architecture using AWS Lambda, CloudFront, Cognito, API Gateway, and integrated monitoring with AWS CloudWatch and X-Ray for scalability and performance. This approach aimed to achieve cost efficiency, high availability, and enhanced security for cloud-based applications.

Tools Used: AWS Lambda, CloudFront, Cognito, API Gateway, CloudWatch, X-Ray, Terraform, Python, MongoDB.

<

Let's Connect

Ready to Build Scalable Solutions?

From Infrastructure to Innovation: Let's Transform Your Vision

Specialized In:

  • Cloud Architecture & Infrastructure
  • AI/ML Systems & GPU Computing
  • Distributed Systems & Scalability
  • DevOps & Site Reliability Engineering

Whether you're scaling your infrastructure, optimizing performance, or implementing cutting-edge AI solutions, I bring the expertise to make it happen. Let's discuss how we can achieve your technical goals while ensuring reliability, scalability, and innovation.

Let's Collaborate On:

  • Building Resilient Cloud Infrastructure
  • Scaling Distributed Systems
  • Optimizing AI/ML Platforms
  • Enhancing System Performance
  • Implementing Site Reliability Best Practices
  • Designing High-Performance Architecture

Ready to get started? Let's discuss your project needs.

Schedule a Consultation