✨

AI Engineer (LLM / GenAI) Interview Prep

Building applications with LLMs. Hottest role of 2026. Heavy mix of Python, ML basics, prompt engineering, and product sense.

15 questions·60+ min, often takes 2-3 rounds including take-home·10 technical, 3 behavioural, 2 scenario

General tips for this role

Build at least one project before applying. A working RAG bot over your own docs is enough.
Know one model deeply (e.g. GPT-4o pricing, limits, strengths) rather than naming many.
Understand WHY transformers work, not just that they do. The self-attention concept is interview gold.
Practise reading and explaining a research paper abstract. Senior interviews often include this.
Be honest about the limits of LLMs. Interviewers love grounded realism over hype.

What is an LLM and how does it work at a high level?

easytechnical

Show model answer

Model answer

A Large Language Model is a neural network trained on huge text datasets to predict the next token (roughly a word) given a context. By repeatedly predicting tokens, it generates coherent text. Modern LLMs use the transformer architecture with self-attention, which lets the model weigh different parts of the input. Examples: GPT-4, Claude, Gemini, Llama 3.

What is the difference between fine-tuning and prompt engineering?

easytechnical

Show model answer

Model answer

Prompt engineering: changing the input to get better output. Cheap, fast, no training needed. Fine-tuning: training the model on more data to specialise it. Expensive, slow, but more reliable. Rule: try prompt engineering first. If that does not work, try RAG. Only fine-tune if you have lots of high-quality domain data and the first two are insufficient.

Explain RAG and when you would use it.

mediumtechnical

Show model answer

Model answer

Retrieval-Augmented Generation: when a user asks a question, you first retrieve relevant documents from your own knowledge base (using vector search) and pass them as context to the LLM. The LLM then answers based on those documents. Use RAG when: you need up-to-date info, you have proprietary data, you need citations, or you want to reduce hallucination. Most enterprise AI chatbots use RAG.

Tip

Mention that RAG does NOT replace the need for a good knowledge base. Garbage in, garbage out.

What is a vector database and why does it matter for AI?

mediumtechnical

Show model answer

Model answer

A database optimised for storing and searching high-dimensional vectors (embeddings). Used in RAG: documents are converted to embeddings, stored in the vector DB, and at query time we find the closest matches. Examples: Pinecone, Weaviate, Chroma, Qdrant, pgvector (Postgres extension). The matching algorithm is approximate nearest neighbour (ANN) — usually HNSW under the hood.

How do you reduce hallucinations in an LLM-based system?

mediumtechnical

Show model answer

Model answer

Multiple layers: (1) Use RAG so the model has accurate context. (2) Add explicit instructions in the prompt: 'Answer only based on the provided context. If the answer is not there, say so.' (3) Lower the temperature (e.g. 0.1) for factual tasks. (4) Add a fact-checking step that re-asks the model to verify its claims. (5) Use structured output (JSON schema) so the model cannot wander. (6) Evaluate with a test set of expected answers and measure hallucination rate.

Walk me through how you would build a customer-support chatbot from scratch.

hardtechnical

Show model answer

Model answer

1) Gather requirements: what questions does it need to answer? What is the success criterion? 2) Build a knowledge base: scrape docs, FAQs, past tickets. 3) Chunk the docs into 500-token pieces. 4) Embed each chunk and store in a vector DB. 5) Build the API: receive user question, embed it, retrieve top-5 similar chunks, send to LLM with the question, return answer. 6) Add safety: profanity filter, PII detection, escalation to human for sensitive topics. 7) Evaluate on a held-out test set. 8) Deploy with rate limiting and monitoring. 9) Set up feedback collection (thumbs up/down). 10) Iterate.

Tip

Always mention the eval set — most candidates skip evaluation entirely.

How would you evaluate the quality of an LLM output?

hardtechnical

Show model answer

Model answer

Multiple methods needed. (1) Automated metrics: BLEU/ROUGE for translation, BERTScore for semantic similarity, exact match for QA. (2) Custom rubrics: write 20 test prompts, define 3-5 criteria (accuracy, helpfulness, tone), score each output 1-5. (3) LLM-as-judge: use a stronger model to score outputs against criteria. Calibrate it against human scores first. (4) Human evaluation: gold standard, expensive. (5) Production monitoring: thumbs-up/down, conversation length, escalation rate.

Tip

Mention LLM-as-judge but caveat: it has its own biases.

What is temperature in an LLM and when would you change it?

mediumtechnical

Show model answer

Model answer

Temperature controls randomness in token selection. 0 = deterministic (always picks highest-probability token). 1 = uses the model's full distribution. 2 = more random/creative. Use low temp (0 to 0.3) for factual tasks (Q&A, classification, code). Use higher (0.7 to 1.0) for creative tasks (story generation, brainstorming). Different from top_p (which limits the candidate set).

What is the context window and why does it matter?

mediumtechnical

Show model answer

Model answer

The max number of tokens the model can process in one request (input + output). GPT-4o: 128k tokens. Claude 3.5: 200k. Gemini 1.5 Pro: 1M+. Matters because: if you have a long doc, you may need to chunk and use RAG instead of fitting it all in. You pay per token, so longer prompts cost more. Models often lose accuracy on info in the middle of very long contexts ('lost in the middle' problem).

How would you handle PII (personal info) when sending data to an LLM?

hardtechnical

Show model answer

Model answer

Several layers: (1) Pre-process: redact or replace PII (names, emails, phone) with placeholders before sending to the LLM. Tools: Microsoft Presidio, Amazon Comprehend. (2) Use a provider with no-training agreements (OpenAI, Anthropic, Azure OpenAI default to no training on enterprise tier). (3) Self-host an open model (Llama, Mistral) for highly sensitive data. (4) Log only essential metadata, not full prompts. (5) Consider regulatory: HIPAA for health, GDPR for EU, etc. Pick provider accordingly.

Tell me about an AI project you built end to end.

mediumbehavioural

Show model answer

Model answer

STAR. Walk through: problem, why AI was right for it, your approach, technical choices and trade-offs, evaluation, deployment, what you learned. End with: what would you do differently next time? Showing humility and growth is more valuable than perfection.

Tip

If you have not built one yet: build one before applying. Even a small RAG over Wikipedia is enough.

How do you stay current with the AI space when it changes every week?

mediumbehavioural

Show model answer

Model answer

Specific habits beat generic. 'I follow the Anthropic and OpenAI blogs. I read the weekly TLDR AI newsletter. I subscribe to the LangChain Discord. I build one small project per month using something new — last month I tried Gemini's grounding feature.' Show curiosity and practical engagement.

Your LLM application is too slow. How do you speed it up?

hardscenario

Show model answer

Model answer

Profile first to find the bottleneck. Common fixes: (1) Use a smaller/faster model for the use case (e.g. GPT-4o-mini instead of GPT-4o). (2) Cache common queries. (3) Stream the response so users see partial results. (4) Reduce prompt size — fewer few-shot examples. (5) Parallelise calls where possible. (6) For RAG: reduce retrieved chunks, use smaller embedding model. (7) Use a faster inference provider (Groq, Together AI). End-to-end latency budget: under 2 seconds for chatbots, ideally.

Your stakeholders want to use AI for everything. How do you decide what to actually build?

hardscenario

Show model answer

Model answer

Push back politely. For each proposal ask: (1) What is the manual process today? (2) What would success look like (metric)? (3) Is AI the right tool, or is a database query / business rule enough? (4) What is the cost of a wrong AI answer? Prioritise based on impact vs effort. Build a prototype in a week to validate before committing to production.

Tip

Shows you can manage expectations, not just build.

What concerns you most about working with AI?

easybehavioural

Show model answer

Model answer

Show you take ethics seriously. 'Hallucination in customer-facing settings.' 'Bias in training data leaking into output.' 'The pace of change making it hard to make stable architectural decisions.' Pick something real you have actually thought about, not a textbook answer.

Tip

Saying 'nothing concerns me' is a red flag.

Help someone else find this

This is free, no ads. Share with anyone preparing for the test.

WhatsApp Post on X LinkedIn Facebook

Telegram Email