Intern AI Evaluation Engineering (f/m/d), Paris

Stage / Job d'étudiant / Alternance

Assurance

IT / Informatique

Publiée depuis plus de 4 semaines

Allianz

Die Allianz Gruppe gehört zu den größten und renommiertesten Finanzdienstleistern der Welt und bietet ein internationales Netzwerk bekannter Marken mit erstklassigen Produkten. Vor allem streben wir nach Spitzenleistungen in unserem Alltag - das heißt, in den Beziehungen zu unseren Kunden, unseren Aktionären, unseren Mitarbeitern und zur Gesellschaft. Die Allianz Gruppe ist für über 76 Millionen Kunden in rund 70 Ländern tätig.

Als verlässlicher Partner verwaltet Allianz Global Investors Vermögen in allen wesentlichen Anlage-klassen und -regionen. Unsere Experten sind in 19 Märkten weltweit vertreten, mit umfassender Prä-senz in Europa, den USA und in Asien. Mit rund 650 Anlagespezialisten deckt unser Investment-Ma-nagement alle wichtigen Finanz- und Wachstumsmärkte der Welt ab. Erstklassiger Service vor Ort sorgt dafür, dass die globale Expertise beim Kunden ankommt.

Tasks

Conduct a structured benchmark of GenAI evaluation frameworks, both open-source (Ragas, DeepEval, TruLens, Phoenix/Arize, ARES, promptfoo) and commercial SaaS platforms (LangSmith, Braintrust, Humanloop, Galileo, Azure AI Evaluation); Compare metrics coverage, ease of integration, cost, licensing, scalability, and enterprise readiness
Curate a gold-standard evaluation dataset (queries, expected outputs, source documents, edge cases) across GenAI Hub's core features
Implement evaluation pipelines using the top 2–3 frameworks, measuring faithfulness, answer relevance, context precision, context recall, hallucination rate, and task-specific quality metrics
Produce a quality baseline report identifying strengths and weaknesses per feature and per search index, with cross-framework comparison of metric consistency
Experiment with programmatic prompt optimization tools (DSPy, TextGrad, MIPRO) to automatically improve retrieval and generation quality against the established baseline
Integrate Responsible AI considerations into the evaluation framework — assessing outputs for bias, toxicity, fairness, and content safety — and recommend guardrails and automated checks for production deployments
Deliver a tool selection recommendation, a reusable evaluation harness, optimized prompt candidates, and a comparative benchmark report (before/after) with cost/quality/safety trade-off analysis.

Profile

Currently pursuing M1/M2 in Data Science, Machine Learning, Computer Science, or a related field
Solid Python programming skills (scripting, data manipulation, API integration)
Foundational understanding of NLP/ML concepts (embeddings, language models, retrieval systems)
Familiarity with evaluation methodology and metrics design
Ability to synthesize findings from multiple tools and produce clear, structured written reports
Curiosity for applied research, tooling evaluation, and emerging AI practices
Working proficiency in English (French is a plus)
No prior enterprise experience required.

Apply

Ces postes pourraient également vous intéresser :

	Éditeur (H/F) de langue allemande, Vanves (région parisienne) Hachette Collections
	Customer Support Officer - Germany (M/F/D), Paris Winamax
	Sports Community Manager Germany (M/F/D), Paris Winamax
	Chargé de projet - Communication numérique au Fonds citoyen franco-allemand (H/F/D), Berlin / Paris OFAJ / DFJW