Torre
Website:
torre.ai
Job details:
I’m helping CHS find a top candidate to join their team full-time for the role of Network Engineer Sonic.
You'll architect and automate next-gen datacenter networks, driving critical infrastructure evolution through programmatic configuration.
Compensation:
Hidden
Location:
Remote: India
Mission of CHS:
"Creating connections to empower agriculture."
What makes you a strong candidate:
- You are proficient in Python, Networking, MongoDB CRUD operations, Bash.
- You have the potential to develop in Linux administration.
- English - Conversational
Responsibilities and more:
Position Overview:
- Looking for a network engineer with experience in datacenter environments and at least light programming experience.
Required Experience:
- SONiC programmatic iterative configuration (gnmi/yang, swss).
- Scripting language (Python, Bash, etc.).
Preferred Experience:
- SONiC base configuration (L2, mclag, lag/portchannel, bgp, bfd, etc.).
- FRR experience (OSPF).
- Light systems programming language (C, C++, Golang, Rust, etc.).
- Linux administration (Bash, systemd units, general system navigation).
- Virtual networking (VXLAN).
Nice to Have Experience:
- OpenGear.
- Stronger systems programming language experience.
- AWS networking (VPC, Direct Connect).
Task Expectations:
1. Programmatic Iterative Configuration of SONiC Switches (yang/gnmi, swss, etc.):
- Has used the above previously to configure, or can trivially identify how to implement CRUD operations (or at least CRD) against constructs such as but not limited to: Physical and sub interfaces, VXLAN/VNI, VRF, ACL.
- At minimum must provide correctly functioning examples.
- Functionality will ultimately be written in Golang.
- A network engineer is merely expected to identify, document, and demonstrate interface functionality for the SingleStore team to implement.
- Network engineer being able to implement the CRUD/CRD functionality as a library/module in Golang would be a plus but is unexpected.
- Actual virtual networking control plane implementation is expected to be responsibility of SingleStore team.
- Network engineer contributing here would be high value, freeing up team to focus on storage implementation.
2. Base Configuration of SONiC Switches:
Inband Configuration:
- L2, mclag, lag/portchannel, bgp, bfd.
- Need to set up anycast addresses for metadata service IP, SAG, etc.
Bifurcated Spine-Leaf Topology (Inband):
- Each side of aisle has 2x spines to be mclag’d.
- Each spine has 2x connections to each other spine to be lag’d.
- Each spine has 2x connections to each ToR/leaf on same side to be lag’d.
- Each side of aisle has private AS.
- ToR-compute node connections breakouts, ToR-storage nodes standard.
Spine-Leaf Topology (Out of Band):
- Each side of aisle has 1x spine.
- Each spine has connection to each ToR/leaf on both sides.
- Currently each side of aisle has private AS.
- Can be argued should be single AS.
- Currently hardcoded L3.
Additional Notes:
- Spine model in use insufficient resources for unified DHCP stack - had to settle on model due to tariff season (isc dhcpd usable).
- Server BMCs previously static IP’d and/or infinite lease’d via DHCP by vendor, require crash carting/manual full reset in order to DHCP.
- Switch management ports physically connected but not currently configured to be reachable via OOB network.
- PDU management ports do not DHCP, require on-site troubleshooting to bring into network.
Coordination:
- Coordinate with DevOps on switch integration with monitoring.
- Transition from ad-hoc to code-driven base configuration.
- Coordinate with DevOps on switch provisioning (ZTP or otherwise).
- Coordinate with DevOps on SONiC build pipeline.
3. Palo Alto Firewall Configuration Remediation:
- Transition from ad-hoc to code-driven configuration.
Multipath & Traffic Handling:
- Ensure multipath functioning correctly.
- Firewall rules engine appears to favor single source interface for all src/dst resulting in erroneous packet drops.
- Ensure upstream egress/ingress A/P functioning correctly.
- Will need to work with network team of colocation vendor providing IP transit to remedy IP transit only having one functioning leg at present.
- May require on-site work/coordination.
Direct Connect:
- Ensure direct connect multipath correctly working.
- Ensure no overly eager security features negatively impacting legitimate traffic (session drops/throttling, unreasonable latency impacts – currently :200ms hit on some traffic).
- Ensure no unlicensed security features enabled (dnssec currently erroneously enabled).
Additional Configuration:
- NAT public IPs for use.
- Interzone traffic rules currently permissive – more mature tiered scheme necessary for long term.
- Coordinate with DevOps on firewall integration with monitoring.
4. Console Network Setup:
Physical Topology:
- 2x OpenGear OM2224 spines.
- 16x OpenGear IM7248 ToRs.
Current State:
- Spines routable, providing loop for firewall management ports.
- Cellular not active.
- ToRs lack ethernet routing.
- All end-device access currently through nested console sessions.
Requirements:
- FRR experience required.
- OpenGear experience bonus.
Challenges & Fixes:
- OpenGear cellular fallback does not work well with multipath (destroys routing when triggered).
- Should be manually implemented using systemd timer with heartbeats over multiple paths.
Setup Tasks:
- Set up direct-to-end-device serial console via SSH.
- Configure standardized versions for IM7248s and OM2224s.
- Set up standardized credentials.
Cellular Requirements:
- Business cellular plan required for OM2224s.
- Minimum 10GB/month, 50GB+ preferred.
- Needed for emergency access and recovery scenarios.
Constraints:
- AT&T allows OpenGears but blocks Palo Alto traffic on consumer plans.
- Verizon 4G coverage inconsistent in colo area.
- One T-Mobile 4G band unsupported by OM2224 modem.
Coordination:
- Work with colocation vendor for antenna extension installation on roof to ensure reliable cellular signal.
Your potential leader(s):
Click on Apply to know more.