LILT
AI
Werkstudent:AIResearch&DataEvaluation
Neural analysis suggests this role is
optimal for Entry candidates.
“Werkstudent: AI Research & Data Evaluation at LILT. Skills: AI Research, Data Evaluation, LLM Stress-testing, Data Engineering for AI benchmarks. Stress-testing world's most advanced models. Evaluating and benchmarking frontier LLMs and autonomous agents”
What You'll Achieve.
Directly impact how frontier LLMs handle complex, multilingual tasks
Industry & Context.
Identify 'model-breaking points'
What They're Looking For.
Must Have
Currently enrolled at TU Berlin majoring in Computer Science (Bachelor/Master) or a related field, Solid understanding of LLMs, natural language processing, or machine learning, Highly proficient in Python, Highly proficient in Bash, Highly proficient in git, Proficient in English
Nice to Have
Proficient in one or more non-English languages
What You'll Do.
Stress-testing world's most advanced models
Evaluating and benchmarking frontier LLMs and autonomous agents
Creating or modifying benchmark data
Designing and running experiments to identify 'model-breaking points'
Interpreting resulting data
How You'll Work.
Team & Collaboration
Work directly with models and teams from frontier labs; Global collaboration
Full Job Description
ABOUT LILT AI is changing how the world communicates — and LILT is leading that transformation. We're on a mission to make the world's information accessible to everyone, regardless of the language they speak. We use cutting-edge AI, machine translation, and human-in-the-loop expertise to translate content faster, more accurately, and more cost-effectively without compromising on brand, voice, or quality. At LILT, we empower our teammates with leading tools, global collaboration, and growth opportunities to do their best work. Our company virtues—Work together, win together; Find a way or make one; Quicker than they expect; Quality is Job 1—guide everything we do. We are trusted by Intel Corporation https://www.linkedin.com/company/intel-corporation/, Canva https://www.linkedin.com/company/canva/, the United States Department of Defense https://www.linkedin.com/company/deptofdefense/, the United States Air Force https://www.linkedin.com/company/united-states-air-force/, ASICS https://www.linkedin.com/company/asics/, and hundreds of global Enterprises. Backed by Sequoia, Intel Capital, and Redpoint, we’re building a category-defining company in a $50B+ global translation market being redefined by AI. YOUR JOB You’ll be stress-testing the world’s most advanced models to see where they break. Your work will directly impact how frontier LLMs handle complex, multilingual tasks. Your work will be supervised by our in-house research staff. - Evaluate & Benchmark: Run rigorous evaluations on frontier LLMs and autonomous agents across diverse tasks. - Data Engineering: Create or modify benchmark data to test the reasoning and linguistic limits of modern AI. - Experimental Research: Design and run experiments to identify "model-breaking points" and interpret the resulting data. YOUR PROFILE - Currently enrolled at TU Berlin majoring in Computer Science (Bachelor/Master) or a related field - Solid understanding of LLMs, natural language processing, or machine learning - Highly
Applying for this Werkstudent: AI Research & Data Evaluation role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about LILT?
Real rants from real employees. Read before you apply.