Mercedes-Benz Group China Ltd

Technology

MVAMulti-ModalityInteractionDeveloper

$350–550k ~AI est. Beijing, China

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“MVA Multi-Modality Interaction Developer at Mercedes-Benz Group China Ltd. Skills: Multimodal Interaction, LLM Integration, Speech Systems, Data Pipelines. Develop based on speech systems. Design multimodal fusion”

Industry & Context.

Technology

What They're Looking For.

Must Have

Experience developing speech systems, Experience with multimodal data integration, Understanding of multimodal data pipelines, Practical experience using LLMs, Ability to consume service-oriented APIs, Experience integrating with automotive systems, Solid understanding of Android system architecture, Cross-team technical communication, Independent problem-solving skills

What You'll Do.

Develop based on speech systems

Design multimodal fusion

Implement multimodal fusion

Normalize multimodal inputs

Structure multimodal inputs

Design multimodal data pipelines

Maintain multimodal data pipelines

Consume vehicle system capabilities

Integrate data from ECUs

Abstract data from ECUs

Explore new data sources

Onboard new data sources

How You'll Work.

Team & Collaboration

Cross-team technical bridge; EE teams; Platform teams; AI teams; UX teams

Full Job Description

Key Responsibilities Develop based on the current mainstream speech systems , including SSPE, wakeup, vad, asr, nlu, dm, tts, LLM, and etc. Design and implement multimodal fusion combining speech, DMS camera, OMS camera, Dash camera, microphone, sensors, audio system state, voice print, and vehicle state data. Normalize and structure multimodal inputs into system context representations suitable for LLM reasoning to support future LLM-based assistant use cases, such as; context-aware dialogue, assistant memory collection and apply , and etc. Design and maintain consistent multimodal data pipelines , handling time alignment, normalization, and state coherence as data flows from vehicle systems into LLM-ready context representations. Consume vehicle system capabilities through service-oriented APIs , enabling intent-driven control of vehicle functions. Integrate and abstract data from multiple vehicle ECUs (audio, cameras, sensors, body, ADAS, etc.), with the ability to independently explore and onboard new data sources. Collaborate closely with EE, platform, AI, and UX teams, acting as a cross-team technical bridge . Required Qualifications Experience developing speech or voice assistant systems , including wake word, VAD, ASR, NLU, dialogue management, TTS, and LLM integration. Hands-on experience with multimodal data integration and fusion , combining audio, camera, sensor, and vehicle state information. Strong understanding of multimodal data pipelines , including normalization, temporal alignment, and state consistency for LLM-ready context. Practical experience using LLMs as a reasoning layer , including context preparation and safe application of outputs. Ability to consume service-oriented vehicle APIs for intent-driven control of vehicle capabilities. Experience integrating with embedded or automotive systems , working across multiple ECUs (audio, camera, sensor, body, ADAS). Solid understanding of Android system architecture , preferably Android Automotive O

Free ATS check

Applying for this MVA Multi-Modality Interaction Developer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Taleo (Oracle)

Taleo is older software — paste plain text resume content to avoid formatting issues.
Avoid special characters, tables, and columns in your resume for this ATS.
The application may time out on inactivity — copy your answers to a text editor as backup.

ANONYMOUS · UNFILTERED

What do employees actually say about Mercedes-Benz Group China Ltd?

Real rants from real employees. Read before you apply.

Read Company Rants →