Mercedes-Benz Group China Ltd
Technology
MVAMulti-ModalityInteractionDeveloper
Neural analysis suggests this role is
optimal for Mid+ candidates.
“MVA Multi-Modality Interaction Developer at Mercedes-Benz Group China Ltd. Skills: Multimodal Interaction, LLM Integration, Speech Systems, Data Pipelines. Develop based on speech systems. Design multimodal fusion”
Industry & Context.
What They're Looking For.
Must Have
Experience developing speech systems, Experience with multimodal data integration, Understanding of multimodal data pipelines, Practical experience using LLMs, Ability to consume service-oriented APIs, Experience integrating with automotive systems, Solid understanding of Android system architecture, Cross-team technical communication, Independent problem-solving skills
What You'll Do.
Develop based on speech systems
Design multimodal fusion
Implement multimodal fusion
Normalize multimodal inputs
Structure multimodal inputs
Design multimodal data pipelines
Maintain multimodal data pipelines
Consume vehicle system capabilities
Integrate data from ECUs
Abstract data from ECUs
Explore new data sources
Onboard new data sources
How You'll Work.
Team & Collaboration
Cross-team technical bridge; EE teams; Platform teams; AI teams; UX teams
Full Job Description
Key Responsibilities Develop based on the current mainstream speech systems , including SSPE, wakeup, vad, asr, nlu, dm, tts, LLM, and etc. Design and implement multimodal fusion combining speech, DMS camera, OMS camera, Dash camera, microphone, sensors, audio system state, voice print, and vehicle state data. Normalize and structure multimodal inputs into system context representations suitable for LLM reasoning to support future LLM-based assistant use cases, such as; context-aware dialogue, assistant memory collection and apply , and etc. Design and maintain consistent multimodal data pipelines , handling time alignment, normalization, and state coherence as data flows from vehicle systems into LLM-ready context representations. Consume vehicle system capabilities through service-oriented APIs , enabling intent-driven control of vehicle functions. Integrate and abstract data from multiple vehicle ECUs (audio, cameras, sensors, body, ADAS, etc.), with the ability to independently explore and onboard new data sources. Collaborate closely with EE, platform, AI, and UX teams, acting as a cross-team technical bridge . Required Qualifications Experience developing speech or voice assistant systems , including wake word, VAD, ASR, NLU, dialogue management, TTS, and LLM integration. Hands-on experience with multimodal data integration and fusion , combining audio, camera, sensor, and vehicle state information. Strong understanding of multimodal data pipelines , including normalization, temporal alignment, and state consistency for LLM-ready context. Practical experience using LLMs as a reasoning layer , including context preparation and safe application of outputs. Ability to consume service-oriented vehicle APIs for intent-driven control of vehicle capabilities. Experience integrating with embedded or automotive systems , working across multiple ECUs (audio, camera, sensor, body, ADAS). Solid understanding of Android system architecture , preferably Android Automotive O
Applying for this MVA Multi-Modality Interaction Developer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Taleo (Oracle)
- Taleo is older software — paste plain text resume content to avoid formatting issues.
- Avoid special characters, tables, and columns in your resume for this ATS.
- The application may time out on inactivity — copy your answers to a text editor as backup.
ANONYMOUS · UNFILTERED
What do employees actually say about Mercedes-Benz Group China Ltd?
Real rants from real employees. Read before you apply.