Data Analyst Interview Prep
Strong SQL focus, plus Power BI/Tableau, business sense, and stakeholder communication.
General tips for this role
- Bring a SQL cheat sheet to remember syntax. Most interviewers do not mind looking up syntax โ they care about logic.
- Practise verbalising your thought process while writing SQL. 'First I'll group by month, then I need to compare to the previous month, so I'll use a window function.'
- Have one strong portfolio dashboard ready to show. Even a simple Power BI report from a Kaggle dataset is enough.
- Memorise the top 5 DAX functions: CALCULATE, SUM, AVERAGE, FILTER, RELATED.
- If asked a question you don't know, say 'I haven't used that yet but my approach would be...' and reason it out.
What is the difference between INNER JOIN and LEFT JOIN in SQL?
Show model answer
INNER JOIN returns only rows that exist in BOTH tables. LEFT JOIN returns ALL rows from the left table, plus matching rows from the right (NULL where no match). RIGHT JOIN does the opposite. FULL OUTER JOIN returns rows from both, with NULLs where no match exists.
Be ready to draw a Venn diagram.
What is GROUP BY and when do you need HAVING?
Show model answer
GROUP BY collapses rows into groups based on shared column values, then you can aggregate (COUNT, SUM, AVG). WHERE filters rows BEFORE grouping. HAVING filters AFTER grouping. Example: 'SELECT region, COUNT(*) FROM orders GROUP BY region HAVING COUNT(*) > 100' shows regions with more than 100 orders.
Find the second highest salary in an employees table.
Show model answer
Many ways: (1) `SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);` (2) Use DENSE_RANK: `SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees) WHERE rnk = 2;` (3) LIMIT with OFFSET: `SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1;` Be ready to discuss handling of ties (DENSE_RANK handles ties best).
Always ask 'what should we do with ties?' before answering.
Explain the difference between a measure and a calculated column in Power BI.
Show model answer
Calculated columns are computed at refresh time and stored in the data model (use memory). They have a row context โ they evaluate row by row. Best for row-level classifications or relationships. Measures are computed at query time and not stored. They have a filter context โ they evaluate based on whatever filters are applied. Best for aggregations like sums, averages, and KPIs. Rule: use measures for aggregations, calculated columns for row-by-row values.
What is a star schema and why is it preferred for analytics?
Show model answer
A star schema has a central FACT table (e.g. Sales) surrounded by DIMENSION tables (e.g. Date, Product, Customer). The fact table has measures (numbers to aggregate); dimensions have descriptive attributes. Star schemas are faster to query, easier to understand, and work better in Power BI/Tableau than normalised schemas. Avoid snowflake schemas in BI tools โ they harm performance and confuse users.
Write a query to calculate month-over-month revenue growth.
Show model answer
Use a window function like LAG: `SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_month, (revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0 / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) AS growth_pct FROM monthly_revenue ORDER BY month;` NULLIF prevents division by zero.
Mention NULLIF โ it shows you think about edge cases.
How do you handle missing data in a dataset?
Show model answer
First understand WHY it is missing: (a) Missing Completely at Random (MCAR) โ okay to ignore or impute. (b) Missing at Random (MAR) โ can impute based on other variables. (c) Missing Not at Random (MNAR) โ biased; must investigate. Options: drop rows (only if missing < 5% and MCAR), impute with mean/median/mode (simple), impute with regression or KNN (better), flag with a binary 'missing' column. Always note in the report what you did and why.
What is CALCULATE in DAX and why is it important?
Show model answer
CALCULATE evaluates an expression in a modified filter context. It is the most powerful DAX function โ about 60% of complex measures use it. Example: 'Sales for last year' = `CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Dates[Date]))`. The first argument is what to calculate; subsequent arguments modify the filter context.
How do you decide between a bar chart and a line chart?
Show model answer
Bar chart: comparing discrete categories (sales by region, top products). Line chart: showing change over time. If your x-axis has categories with no natural order, use bar. If it has dates or a sequence, use line. Pie charts: avoid them in business reporting โ humans are bad at comparing angles. Use bar instead.
What is the difference between Power Query and DAX?
Show model answer
Power Query (M language) is for data preparation: importing, cleaning, transforming, joining sources. Happens at refresh time. DAX is for analysis: calculations on the loaded model. Happens at query time. Rule of thumb: do as much as possible in Power Query (it is faster at refresh) and only use DAX for things that need to respond to user interactions (slicers, filters).
Tell me about a time your analysis changed a business decision.
Show model answer
STAR. Pick a real example where your analysis surfaced something unexpected. Show the IMPACT in numbers. 'I noticed our retention dipped after a UI change. Cohort analysis showed users on mobile were churning 3x faster than desktop. Product reverted the change on mobile within a week and retention recovered.'
Always quantify the impact. 'We made changes' is weak. 'ยฃ200k revenue saved' is strong.
How do you explain complex analysis to non-technical stakeholders?
Show model answer
Start with the conclusion (the 'so what'). Use one chart to illustrate it. Avoid jargon. If they ask 'how did you get that?', go deeper. Most stakeholders want the answer; only some want the method.
Mention the 'pyramid principle' (conclusion first, evidence below) if you know it.
How do you stay current with data tools?
Show model answer
Specific examples beat generic answers. 'I read SQLBI's blog and follow Guy in a Cube on YouTube. Last month I learned to use DAX Studio to optimise slow measures.' Show curiosity and concrete habits.
Your dashboard has been slow for two weeks. How do you investigate?
Show model answer
First check the data model: is it a star schema or a mess? Are there unnecessary calculated columns? Then look at the measures: are they using FILTER inside CALCULATE (slow) or just CALCULATE (fast)? Run DAX Studio to find slow queries. Check if the dataset is too large โ incremental refresh might help. Finally check the report visuals โ too many visuals on one page slow rendering.
Mention DAX Studio. Most candidates don't know it exists.
Marketing asks for a list of users who 'engaged with the new feature'. How do you handle the request?
Show model answer
Do NOT just give them a list. First ask: what does 'engaged' mean? Clicked once? Used 3 times in 7 days? Returned after first use? Each definition gives different numbers. Get them to commit to a definition. Then deliver. This shows analytical rigor and saves you from being blamed for wrong numbers.
This question tests communication. The 'right' answer is 'I would ask clarifying questions'.