The Platform Manager/Dev Ops will be responsible for the overall infrastructure design and implementation of a highly-available web-based system supporting multiple applications using Amazon Web Services (AWS). This individual will be responsible for effective architecture design, provisioning, installation/configuration, operation, and maintenance of cloud infrastructure and software using AWS. This individual will participate in technical research and development to enable continuing innovation within the infrastructure. This individual will ensure that cloud infrastrcture, operating systems, software systems, and related procedures adhere to all applicable security regulations.
This individual will assist project teams with technical issues in the Initiation and Planning phases of development. These activities include the definition of needs, benefits, and technical strategy; research & development within the project life-cycle; technical analysis and design; and support of operations staff in executing, testing and rolling-out the solutions.
Qualified applicants must be detail-oriented, result-driven individuals who can take ownership of a project and can work well without direct oversight. They must have good interpersonal skills, as well as good client interaction skills.
- Design overall architecture for highly-available AWS system using EC2, RDS, S3, and DynamoDB services.
- Provision new / rebuild existing cloud servers and configure services, settings, directories, storage, permissions, etc. in accordance with standards and project/operational requirements.
- Develop and maintain installation, configuration and operations procedures.
- Manage team of infrastructure specialist to assist in implementation and support of systems.
- Perform regular system monitoring, verifying the integrity and availability of all cloud systems, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups.
- Perform regular security monitoring to identify any possible intrusions.
- Provide Tier III/other support per request from various constituencies. Investigate and troubleshoot issues.
- Repair and recover from system or software failures. Coordinate and communicate with impacted constituencies.
- Install, configure and manage network services to support the project
- Install, configure and support Operating Systems - Redhat/CentOS Linux
- Responsible for cloud infrastructure and software, including maintenance, support and updates
- Maintain and update Infrastructure documentation
- Provide assistance with scripts (writing, troubleshooting) to accomplish system administration tasks as needed
- Provide general system administration support
- Participate in deployment and configuration management activities
- Patch management and firmware evaluation, recommendations and reports, assist and advice in creation of standardized build templates
- Assists the technical team with systems analysis, system and application maintenance, capacity planning, and diagnostics, making recommendations and providing reports. Activities may include:
- Operating systems and network troubleshooting and problem reports especially problems that would be considered advanced or non-routine problems.
- Troubleshoot system and software issues, especially non-routine issues, provide recommendations for fix.
- Performance Analysis, troubleshooting, and problem reports
- Provide Capacity Planning Analysis and Reports
- Assists with the configuration and testing of additional inter-domain networking and alternate path and/or dynamic reconfiguration. Activities may include:
- Configure and test any inter-domain networking
- Configure and test alternate pathing and/or dynamic reconfiguration
- Compilation and documentation of inter-domain networking, alternate pathing and/or dynamic reconfiguration activities in an operational guide.
- Provides expertise to address architectural and process initiatives. Activities may include:
- Assist with the planning, evaluation and implementation of enterprise systems re-architecture/system refresh.
- Evaluate and recommend new environment architecture including cluster redesign, virtualization options, filesystem design.
- Address backup architecture and the use of technologies such as clones, snaps, and so forth
- Assist with customer server performance issues
- Serve as lead technical liaison in technical meetings
- Develops and maintains detailed and accurate. Documentation may be required for operational procedures, troubleshooting aids, and technical analyses for products, features, and capabilities.
Requirements and Experience Guidelines:
- Must be a U.S. Citizen or U.S. Legal Permanent Resident
- AWS System administration experience required
- Demonstrated understanding of System Administration for AWS Unix servers
- Experience with RDS, and/or DynamoDB services desired
- Network administration experience desired
- Experience with managing Tomcat servers in AWS environment
- Knowledge of web technologies, prefer ably Apache/Tomcat
- Understanding of backup and disaster recovery processes and configuration
- Strong troubleshooting skills to resolve Infrastructure related problems
- Experience with source control, unit test, continuous integration tools will be a plus
- Ability to learn new technologies quickly
- Demonstrated excellent communication skills including the ability to effectively communicate with internal and external customers
- Ability to support clustered systems in an enterprise production 24x7 environment
- At least 5 years of experience as configuring, installing, and supporting AWS infrastructure
- At least 5 years of experience configuring, testing, evaluating network needed to support cluster configurations.
- At least 5 years evaluating and deploying software tools in an AWS production environment.
- Excellent project management skills in order to plan for upgrades to the cluster environment.
- Excellent problem management and troubleshooting skills.
- Excellent verbal and written communication skills to be able to ascertain user requirements and prepare documentation.
- Excellent customer interface skills. Demonstrated ability to deal with customers in a challenging environment.
- At least 5 years of experience in cluster system design and ability to provide system architecture advice.
- CloudFormation experience desired.
- Bachelor’s degree in Computer Science, Engineering, or related field required
Job Status: Contract/Temporary