Weekday AI
ScientificAIEvaluation&ComputationalProblemDesigner
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Scientific AI Evaluation & Computational Problem Designer at Weekday AI. Skills: Scientific AI Evaluation, Computational Problem Design, Domain-specific scientific software expertise, Python programming, Reasoning complexity assessment. Design advanced computational problems requiring the use of domain-specific scientific software. Create tasks that test both precise execution (multi-step workflows, simulations) and strategic reasoning (experiment design, inference from partial data)”
What You'll Achieve.
Building a large-scale evaluation benchmark to test advanced AI reasoning across scientific and engineering domains; Ensuring the right balance of difficulty, depth, and reasoning complexity in problems
Industry & Context.
Designing rigorous, research-grade computational problems; Assessing AI reasoning effectiveness; Leveraging scientific software tools for complex challenges; Developing problem setups, solution pathways, and validation mechanisms; Calibrating and refining tasks based on model performance; Ensuring problems emphasize reasoning strategy over brute-force computation
Work must not involve sharing confidential or proprietary information from any current or past employer or institution, This opportunity does not currently support certain work authorization categories
What They're Looking For.
Must Have
Graduate-level expertise (MS or PhD) in a relevant STEM field, Hands-on experience using scientific software libraries for real research problems, Python programming skills, including building computational workflows and validators, Ability to design challenging problems that require deep reasoning rather than surface-level solutions, Familiarity with edge cases, limitations, and practical challenges of scientific tools, Demonstrated proficiency with at least one relevant scientific library (via research, open-source work, or industry experience), Ability to work independently and iterate based on feedback, Comfort working in Linux/terminal environments and remote compute setups, Availability of at least 15–20 hours per week
Nice to Have
MS or PhD preferred, Experience across multiple domains or tools, Background in evaluation frameworks or benchmarking, Experience in teaching, pedagogy, or problem-set design, Familiarity with reproducible research practices and containerized environments
What You'll Do.
Design advanced computational problems requiring the use of domain-specific scientific software
Create tasks that test both precise execution (multi-step workflows
simulations) and strategic reasoning (experiment design
inference from partial data)
Develop problem setups
and validation mechanisms
Calibrate and refine tasks based on model performance to achieve target difficulty levels
Ensure problems emphasize reasoning strategy over brute-force computation
Iteratively refine problems through calibration against state-of-the-art AI models
Full Job Description
**This role is for one of our clients** **Compensation: $45-$100 per hour ** We are building a large-scale evaluation benchmark to test advanced AI reasoning across scientific and engineering domains. This role focuses on designing rigorous, research-grade computational problems that assess how effectively AI systems can leverage real scientific software tools to solve complex challenges. Unlike traditional annotation roles, this position requires creating original, graduate-level problems rooted in real-world scientific workflows. You will iteratively refine these problems through calibration against state-of-the-art AI models, ensuring the right balance of difficulty, depth, and reasoning complexity. **Requirements** **What You’ll Do** * Design advanced computational problems requiring the use of domain-specific scientific software * Create tasks that test both precise execution (multi-step workflows, simulations) and strategic reasoning (experiment design, inference from partial data) * Develop problem setups, solution pathways, and validation mechanisms * Calibrate and refine tasks based on model performance to achieve target difficulty levels * Ensure problems emphasize reasoning strategy over brute-force computation **Domains & Tools of Interest** We are particularly seeking candidates with hands-on experience in: * **Bioinformatics & Single-Cell Genomics:** scanpy, scvelo, squidpy, gudhi (RNA-seq, trajectory inference, spatial transcriptomics) * **Computational Chemistry:** PySCF (HF, DFT, TDDFT, CASSCF, post-HF methods) * **Particle & Nuclear Physics:** scikit-hep, Monte Carlo simulations, collider data analysis * **Electrical Engineering:** scikit-rf, ngspice (RF systems, circuit simulation) * **Astrophysics & Cosmology:** astropy (cosmological modeling, survey analysis) * **Structural & Mechanical Engineering:** scikit-fem (finite element analysis, elasticity, beam theory) * **Seismology & Geophysics:** ObsPy, SPECFEM (waveform analysis, inversion, tomogra
Applying for this Scientific AI Evaluation & Computational Problem Designer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Weekday AI?
Real rants from real employees. Read before you apply.