LILT

Werkstudent:AIResearch&DataEvaluation

Berlin, Germany PART TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Entry candidates.

The Brief

“Werkstudent: AI Research & Data Evaluation at LILT. Skills: AI Research, Data Evaluation, LLM Stress-testing, Data Engineering for AI benchmarks. Stress-testing world's most advanced models. Evaluating and benchmarking frontier LLMs and autonomous agents”

What You'll Achieve.

Directly impact how frontier LLMs handle complex, multilingual tasks

Industry & Context.

Problems you'll solve

Identify 'model-breaking points'

What They're Looking For.

Must Have

Currently enrolled at TU Berlin majoring in Computer Science (Bachelor/Master) or a related field, Solid understanding of LLMs, natural language processing, or machine learning, Highly proficient in Python, Highly proficient in Bash, Highly proficient in git, Proficient in English

Nice to Have

Proficient in one or more non-English languages

What You'll Do.

Stress-testing world's most advanced models

Evaluating and benchmarking frontier LLMs and autonomous agents

Creating or modifying benchmark data

Designing and running experiments to identify 'model-breaking points'

Interpreting resulting data

How You'll Work.

Team & Collaboration

Work directly with models and teams from frontier labs; Global collaboration

Full Job Description

ABOUT LILT AI is changing how the world communicates — and LILT is leading that transformation. We're on a mission to make the world's information accessible to everyone, regardless of the language they speak. We use cutting-edge AI, machine translation, and human-in-the-loop expertise to translate content faster, more accurately, and more cost-effectively without compromising on brand, voice, or quality. At LILT, we empower our teammates with leading tools, global collaboration, and growth opportunities to do their best work. Our company virtues—Work together, win together; Find a way or make one; Quicker than they expect; Quality is Job 1—guide everything we do. We are trusted by Intel Corporation https://www.linkedin.com/company/intel-corporation/, Canva https://www.linkedin.com/company/canva/, the United States Department of Defense https://www.linkedin.com/company/deptofdefense/, the United States Air Force https://www.linkedin.com/company/united-states-air-force/, ASICS https://www.linkedin.com/company/asics/, and hundreds of global Enterprises. Backed by Sequoia, Intel Capital, and Redpoint, we’re building a category-defining company in a $50B+ global translation market being redefined by AI. YOUR JOB You’ll be stress-testing the world’s most advanced models to see where they break. Your work will directly impact how frontier LLMs handle complex, multilingual tasks. Your work will be supervised by our in-house research staff. - Evaluate & Benchmark: Run rigorous evaluations on frontier LLMs and autonomous agents across diverse tasks. - Data Engineering: Create or modify benchmark data to test the reasoning and linguistic limits of modern AI. - Experimental Research: Design and run experiments to identify "model-breaking points" and interpret the resulting data. YOUR PROFILE - Currently enrolled at TU Berlin majoring in Computer Science (Bachelor/Master) or a related field - Solid understanding of LLMs, natural language processing, or machine learning - Highly

Free ATS check

Applying for this Werkstudent: AI Research & Data Evaluation role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 19 detected · ranked by frequency

Data Evaluation ×3

Evaluating LLMs ×3

Benchmarking LLMs ×3

Data modification ×3

Experiment design ×3

Data interpretation ×3

AI Research ×2

LLM Stress-testing ×2

Data Engineering for AI benchmarks ×2

Python ×2

Bash ×2

git ×2

LLMs

Machine Translation

Machine Learning

Natural Language Processing

Experimental Research

Data Engineering

BEHAVIOURAL

Appetite to quickly understand and incorporate new methodologies and models in a rapidly changing research landscapeDrive to ship customer projects, sometimes on tight deadlines, to high quality

Role Details

Experience 0–2 yrs

Level Entry

Work Mode Hybrid

Type PART TIME

Category research

AI-Extracted Insights

Domain Areas

large-language-models-llmsnatural-language-processing-nlpmachine-learning-mlmultilingual-ai

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about LILT?

Real rants from real employees. Read before you apply.

Read Company Rants →