Lead Infrastructure Engineer - Kafka (KEES) Messaging Operations
Wells Fargo
- Location
- Bengaluru South, Karnataka, India
- Job type
- Full-time
Required skills
- Python
- AWS
- Ansible
- Apache
- Apache Kafka
- Azure
- banking
- Bash
- BGP
- capacity planning
- compliance
- Datadog
- JSON
- Kafka
- middleware
- proxy
- RabbitMQ
- Splunk
About the role
Wells Fargo
Website:
wellsfargo.com
Job details:
About This Role
Wells Fargo is seeking a Lead Infrastructure Engineer for Middleware Messaging Kafka operations.
In This Role, You Will
- Lead or participate in high-level technical concepts spanning technology and business to drive strategic initiatives.
- Develop specifications for complex infrastructure systems, design, and test solutions ensuring scalability and reliability.
- Design and implement complex system upgrades to maintain performance and compliance.
- Responsible for designing, implementing, and automating Business Continuity Planning (BCP) and Disaster Recovery (DR) exercises, as well as executing routine system maintenance procedures to ensure operational resilience.
- Preferred experience in leading and executing data center migration projects with a focus on middleware platforms, including planning, configuration, and integration of application servers and related components.
- Direct daily Risk and Control flow of Compliance, focusing on policies, procedures, and work standards to ensure success.
- Outline and implement product support, maintenance, and enhancements to ensure optimal system performance.
- Develop a long-range plan designed to resolve problems and prevent recurrence.
- Design and implement observability dashboards that provide a unified, single-pane-of-glass view for executive-level oversight.
- Employ monitoring tools such as Grafana, AppDynamics, and Splunk for business applications and operations teams.
- Contribute to the analysis of business, application, and technical infrastructure requirements to support solution design.
- Contribute to the testing of business applications and technical infrastructure requirements.
- Generate documentation, operational policies, and procedures to support governance and compliance.
- Collaborate and consult with peers, colleagues, and managers to resolve issues and achieve goals.
Required Qualifications
- 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Desired Qualifications
- 10+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education.
- Minimum 8 years of hands-on experience managing large-scale Middleware environments, including Kafka, Confluent Platforms.
- Strong hands‑on experience with Apache Kafka in enterprise production environments
- Deep expertise in setting up confluent platform including KRAFT, Brokers, Schema registry, Rest proxy, Kafka Connect, KsqlDB
- , Cluster Linking.
- Mandatory Experience in deploying and operating Kafka on OpenShift (OCP) with Confluent Operator.
- Expertise in provisioning Topics, ACLs, Idempotent Producers, Consumers
- Proficient in tuning the GC, heap, partitions, ISR, retention, quotas
- Expert troubleshooting the consumer lag issues, slowness, Message loss & duplication, Throughput, broker crashes, offset issues, disk saturation.
- Expert is setting up Kafka Security with TLS.
- Experience in Kraft mode setup and migration.
- Experience on setting and configuring Prometheus and Grafana Dashboards for Platform health.
- Knowledge of large-scale performance tuning and benchmarking with tools like KafkaPerf or kafkacat/kcat.
- Exposure to cloud-based Kafka (Confluent Cloud, AWS MSK, Azure Event Hubs for Kafka).
- Experience with Schema evolution, compatibility modes, and Avro/JSON/Protobuf.
- Automation/scripting (Python, Bash, Ansible).
- Experience with other messaging platforms (Solace, IBM MQ, RabbitMQ).
- Networking concepts: BGP, routing, multicast, certificates at enterprise scale.
- Experience managing capacity planning and throughput optimization.
- Familiarity with enterprise monitoring tools (Prometheus, Grafana, Datadog, Splunk).
- Confluent certifications (CCDAK, CCP) are a plus.
- Experience supporting an enterprise-level environment.
- Thorough understanding compliance standards and risk management practices for systems within the BFSI (Banking, Financial Services, and Insurance) sector.
- Provide implementation support for key risk initiatives.
- ITIL v4 Certification.
Reference Number
R-525872
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.