Data Analyst interview questions for TCS are as follows :Â
Â
Answer:
Answer:
I am comfortable with Excel for basic analysis, SQL for querying databases, and Python (using Pandas, NumPy) for data manipulation. I also use R for statistical analysis, Tableau and Power BI for data visualization, and Google Analytics for analyzing web traffic.
Â
Answer:
I handle missing data by:
Answer:
Answer:
Normalization or standardization is used to scale the data to a similar range or distribution, ensuring that variables with different units or scales do not dominate the analysis, especially for machine learning models.
Â
Answer:
A p-value measures the probability of obtaining results as extreme as the observed ones, assuming the null hypothesis is true. If p ≤ 0.05, we reject the null hypothesis, indicating a statistically significant result. If p > 0.05, we fail to reject the null hypothesis.
Â
Answer:
Answer:
The Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population’s distribution. This is important because it allows us to make inferences about population parameters using the normal distribution, even with non-normal data.
Â
Answer:
A hypothesis test is used to assess whether there is enough statistical evidence to reject a null hypothesis. It involves:
Answer:
I used A/B testing to compare two website landing pages for a client. After running the test, I applied a t-test to assess conversion rates and concluded that one version led to significantly higher conversions, enabling the marketing team to optimize their strategy.
Â
Answer:
Common techniques include:
drop_duplicates()
.Answer:
Outliers are data points significantly different from the rest of the dataset. I handle them by:
Answer:
Yes, I use efficient SQL queries with indexing and optimize Python code by working with chunks of data using Dask or Vaex. I also use sampling and parallel processing for faster performance.
Â
Answer:
I identify duplicates using methods like duplicated()
in Python and then either remove them or aggregate the values if they represent multiple valid entries (e.g., sum transactions for the same customer).
Â
Answer:
I perform EDA to identify inconsistencies, such as missing values or data type mismatches. I use visualization (e.g., histograms, box plots) to detect outliers and correct or standardize data where necessary.
Â
Answer:
I prefer Python for its flexibility and vast libraries (e.g., Pandas, NumPy) for data manipulation. SQL is my go-to tool for database queries, and Excel is useful for quick analysis and reporting. R is useful for statistical analysis.
Â
Answer:
In my previous role, I used SQL to extract sales data from a customer database. A simple SQL query might look like this:Â
SELECT customer_id, product_id, SUM(amount)
FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY customer_id, product_id;
Answer:
I start with summary statistics and visualizations (e.g., histograms, box plots, scatter plots) to identify patterns, distributions, and outliers. I also look at correlations and relationships between variables using heatmaps and pair plots.
Â
Answer:
Yes, I’ve used Tableau and Power BI to create interactive dashboards, track KPIs, and present data insights. I use them to combine multiple data sources into clear visual stories and help stakeholders make informed decisions.
Â
Answer:
In Excel, I use pivot tables for summarizing data, VLOOKUP and INDEX-MATCH for data retrieval, and advanced functions like SUMIF, COUNTIF, and TEXT functions for analysis. I also use conditional formatting and charts to visualize the data.
Â
Answer:
I analyzed customer feedback to identify areas for improving a product. I collected survey data, performed sentiment analysis, and presented key insights that led to product enhancements, increasing customer satisfaction by 15%.
Â
Answer:
I would compare metrics before and after the campaign (e.g., sales, website traffic, conversion rate). I might perform A/B testing to compare campaign performance or use statistical methods to test for significance in changes.
Â
Answer:
In a previous role, I worked on analyzing customer churn for a subscription service. Using predictive modeling and customer data, I identified key factors driving churn. This helped the company reduce churn by 10% through targeted interventions.
Â
Answer:
I would use time series analysis techniques like ARIMA or exponential smoothing to model historical sales data, then forecast future sales by validating the model’s accuracy and adjusting for seasonality and trends.
Â
Answer:
I simplify complex data by focusing on key takeaways, using clear visualizations (charts, graphs), and explaining the impact on business goals. I also provide actionable recommendations based on the data.
Â
Answer:
Answer:
Common algorithms include:
Answer:
Regression analysis models the relationship between dependent and independent variables. For example, I used linear regression to predict sales based on marketing spend, helping the company allocate resources more effectively.
Â
Answer:
I handle multicollinearity by:
Answer:
Yes, I handle time series data by decomposing it into trend, seasonality, and residual components. I use models like ARIMA or SARIMA to account for trends and seasonality in the data before forecasting.
Â