Senior Systems Operations Engineer
Wells Fargo
- Location
- Bengaluru South, Karnataka, India
- Job type
- Full-time
Required skills
- Tableau
- Python
- Agile
- Ansible
- Azure
- business strategy
- capacity planning
- cloud computing
- communication skills
- data visualization
- database
- Docker
- Elasticsearch
- GCP
- GPU
- Hadoop
- Kubernetes
- Linux
- NLP
- PaaS
- production support
- Root Cause Analysis
- Splunk
- user stories
- BI tools
- Teradata
About the role
Wells Fargo
Website:
wellsfargo.com
Job details:
About This Role
Wells Fargo is seeking a Senior Systems Operations Engineer.
In This Role, You Will
- Lead or participate in managing all installed systems and infrastructure within the Systems Operations functional area
- Contribute in increasing system efficiencies and lowering the human intervention time on related tasks
- Review and analyze moderately complex operational support systems, application software, and system management tools to ensure the highest levels of systems and infrastructure availability
- Work with vendors and other technical personnel for problem resolution
- Lead team to meet technical deliverables while leveraging solid understanding of technical process controls or standards
- Collaborate with vendors and other technical personnel to resolve technical issues and achieve highest levels of systems and infrastructure availability
Required Qualifications
- 4+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education.
Desired Qualifications
- Experience as Site Reliability Engineer
- Knowledge/experience on developing automated solutions using Python
- Hands on knowledge about LLMs/ leveraging LLM/ supporting LLM based solutions
- Knowledge/experience of Puppet/Ansible.
- Big data experience needed (Big Query, Hadoop)
- Linux O/S capabilities
- Experience in AIML area (MLOps)
- Pyspark experience
- Experience with Tableau/ MicroStrategy or similar BI tools
- Strong experience with monitoring systems such as Splunk, App Dynamics.
- Working knowledge of Auto ML technologies such as H2O Driverless AI, DataRobot, VertexAI, Elastic and Vector DB
- Good understanding and hands on with GCP
- Excellent verbal, written, and interpersonal communication skills. Ability to articulate technical solutions to both technical and business audiences
- Recent and demonstrated ability to influence management on technical or business solutions
- Working knowledge of design and build grid computing with CPU and GPU supporting AIML and NLP
- Working knowledge of high-performance storage technologies along with Object Storage
- Knowledge and understanding of network infrastructure to support high throughput and low latency grid computing.
- Willing to work in shifts
- Experience in LLM , Generative AI (dev/ops).
- Experience in Elastic Search, Vector Database would be added benefit.
- Experience with data processing technology (AbInitio, Informatica, IBM DataStage)
- Experience with large data technology (Hadoop, Teradata, Elasticsearch, etc.)
- Understanding of Agile practices and ability to work with Agile teams to define and track user stories
- Experience with implementing complex F5 or other Load Balancer Technologies
- Working knowledge of building high resiliency grid/cloud computing infrastructure supporting AIML and NLP workloads
- Knowledge and understanding of Cloud computing, PaaS design principles and micro services and containers
- Working knowledge/experience with Azure and/or GCP
- Working knowledge/experience with on-premise and Public Cloud technologies, such as Cloud Foundry, Kubernetes, Docker
- Experience in facilitating analysis of current systems and problem identification and resolution
- Ability to facilitate technically complex discussions and working sessions in person or via teleconference
Job Expectations
- Participate in development of Generative AI Platform Capabilities
- Responsible for AI model delivery to on-prem infrastructure and cloud platforms (GCP, Azure ML)
- Participate in day-to-day scrum calls for platform capability build
- Research industry best practices, evaluate new technologies, develop standards and engineering best practices and recommend innovative solutions that support automation and improve platform resiliency and fault tolerance of critical applications
- Execute on roadmaps that align with technology and business strategy.
- Perform hardware and capacity planning, analysis and forecasts for your portfolio of applications with focus on highest availability, scalability, performance, and timely delivery
- Act as an expert resource for other technical teams within DTI
- Deliver day-to-day Application/Platform support services for Digital, AI/ML Platforms
- Responsible for support functions and driving the execution of multiple Application/Platform support services including incident triage, root cause analysis, change evaluation-execution-validation, deployment management, and risk & vulnerability management.
- Provides on-call production support of Mission Critical applications and resolve issues with in RTO.
- Ensure effective production systems monitoring, alarming and notification response/maintenance.
- Leverage diagnostic tools to maintain, troubleshoot and restore service or data to systems
- Structure Operational data and come up with creative data visualization solutions (Build Dashboards)
- Automate Production support routines leveraging AI
- Maintain and update support documentation (e.g. game plans, run books, procedures, and process).
- Communicate, co-ordinate and collaborate with multiple support teams and stakeholder.
Reference Number
R-520793
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.