AppExert
Website:
appexert.com
Job details:
About AppExert
At AppExert, we enable top remote developers to work with interesting tech. SMBs and high-growth startups/scale-ups across the globe in the comfort of working from anywhere.
We offer 100% remote full-time employment with options to work from one of our remote pods in Montreal, Toronto, Chennai & Bangalore. If you're seeking to work on the most innovative projects, AppExert is the ideal platform. AppExert offer global work opportunities that are accessible regardless of your geographical location.
With AppExert, we've established the swiftest-expanding and most joyful community of remote developers, enabling you to follow your passions while remaining closely connected to what holds genuine significance for you. We've dismantled the obstacles that constrained your potential, granting you the authority to uncover work prospects that resonate with your principles and aspirations. The dilemma of deciding between a gratifying career and a meaningful connection to your roots is no longer a concern.
Responsibilities
- System Monitoring and Incident Response:
- Monitor the performance and availability of our software systems, utilizing monitoring tools and implementing effective alerting mechanisms.
- Respond to incidents and troubleshoot issues promptly to minimize downtime and ensure service reliability.
- Collaborate with development teams to analyze and resolve recurring issues through root cause analysis.
- Configuration Management:
- Collaborate with the DevOps team to automate infrastructure provisioning, configuration, and deployment processes.
- Implement and maintain configuration management tools to ensure consistency across multiple environments.
- Continuously improve configuration management practices to increase efficiency and reliability.
- Performance Optimization and Scalability:
- Identify performance bottlenecks and optimize system components to ensure optimal performance and scalability.
- Conduct load testing and capacity planning to anticipate and accommodate future growth.
- Collaborate with development teams to improve application performance and scalability through code optimizations and architectural enhancements.
- Incident Prevention and Reliability Engineering:
- Conduct proactive system analysis and monitoring to identify potential issues before they impact production systems.
- Implement reliability engineering practices, such as fault tolerance, error budgeting, and disaster recovery planning.
- Continuously analyze system metrics and logs to detect anomalies and identify areas for improvement.
- Collaboration and Documentation:
- Collaborate closely with cross-functional teams, including developers, system administrators, and quality assurance, to improve system reliability.
- Maintain clear and up-to-date documentation of system configurations, processes, and troubleshooting guides.
- Participate in knowledge sharing activities and contribute to the development of best practices and standards.
- Incident tracking, scanning the production servers, check the dashboard, logs, and report back to the team for errors/things that need to be fixed
- Example: "is this a known issue/new issue, if new then investigate further"
- Overall keeping an eye on the production system and escalating
- When customer orders are failing or stuck, escalate and track.
- Logging incidents with different priorities based on the impact.
Requirements
- Proven work experience as a Technical Support Engineer, Production Support Engineer, Systems Engineer or similar role.
- Hands on experience with Linux/Unix systems and command-line tools.
- Hands on experience on the ticketing tool like ServiceNow, Jira, Zendesk & monitoring tools like Splunk, New Relic, Grafana, LogicMonitor, Datadog.
- Hands-on experience with Windows/Linux environments
- Proficiency in at least one scripting language (e.g., Bash, PowerShell).
- Experience with monitoring and log analysis tools
- Basic experience in DevOps practices, including CI/CD, infrastructure automation, and configuration management.
- Excellent problem-solving and communication skills
- Ideally willing to work Saturdays and Sundays (5 days a week, 2 other days off per week)
- Most importantly - a critical thinker - Someone who can see if something is a known issue, needs to be escalated ASAP, or wait until tomorrow, etc - this will come with time but someone who thinks critically is key!
Nice to Have:
- Experience with Networking & AWS
- Additional certification in Linux, Cisco or similar technologies
Why AppExert?
At AppExert, our main objective is to cultivate a supportive community for remote developers, ensuring a strong sense of belonging. We offer a variety of benefits to ensure you can always work hard and have fun:
- Connect and collaborate with like-minded professionals from around the world, expanding your network and knowledge.
- Flexibility and freedom, allowing you to choose your own work location. Whether you prefer working from the comfort of your home, a bustling coffee shop, or a tranquil beach, the choice of location is entirely yours.
- Secure and reliable remote work environment, ensuring that our employees can enjoy the benefits of flexibility while having a solid foundation to thrive professionally.
- A supportive environment where you can sharpen your skills, receive valuable feedback and stay up-to-date with the latest industry trends.
- Robust infrastructure, effective communication channels, and remote collaboration tools.
- A healthy work-life balance by ensuring that our developers have a standard 40-hour workweek, allowing them to excel in their roles while maintaining their well-being.
- Paid time off so you can really recharge and enjoy life.
- Health, wellness, and lifestyle benefits to balance your heart, mind, and body.
- Virtual team building activities and social events - we foster a sense of connection among our team members, recognising the significance of staying united even in remote work settings using
- An amazing culture to top it all off!
Click on Apply to know more.