Basics of Python
1. What is Python and why is it popular?
2. Explain the difference between Python 2 and Python .
3. How does Python manage memory?
4. What are Python’s data types?
5. Explain the difference between lists and tuples.
6. How do you create a dictionary in Python?
7. What is list comprehension and provide an example?
8. How do you handle exceptions in Python?
9. Explain the use of the `with` statement.
10. What is PEP 8?
11. How do you manage packages in Python?
12. What are *args and **kwargs and when would you use them?
13. How do you concatenate strings in Python?
14. What are lambda functions?
15. Explain the concept of mutable and immutable data types.
16. How do you copy an object in Python?
17. What is recursion and provide a simple example?
18. How do you reverse a list?
19. What is the difference between `deepcopy` and `copy` in Python?
20. How do you convert a string to a number?
Object-Oriented Programming (OOP) Concepts
21. What is OOP?
22. Explain the concept of classes and objects.
23. What is inheritance and give an example?
24. How does Python support polymorphism?
25. Explain method overriding in Python.
26. What are private, protected, and public attributes?
27. How do you create a class constructor?
28. What is a class method, static method, and instance method?
29. How do you achieve encapsulation in Python?
30. Explain the use of super() in Python.
31. What is method resolution order (MRO) in Python?
32. How do you check if an object is an instance of a particular class?
33. What is the difference between `__new__` and `__init__`?
34. Explain the concept of composition in Python.
35. How do you use modules and packages in OOP?
Pandas
36. What is Pandas and why is it used?
37. How do you read a CSV file using Pandas?
38. Explain Series and DataFrame in Pandas.
39. How do you handle missing data in Pandas?
40. How do you filter data in a DataFrame?
41. Explain the use of the `groupby` method.
42. How do you merge and join DataFrames?
43. What is the difference between `iloc` and `loc`?
44. How do you apply a function to a DataFrame?
45. How do you change the index of a DataFrame?
46. What is a MultiIndex DataFrame?
47. How do you deal with duplicate data in Pandas?
48. How do you convert a DataFrame to a NumPy array?
49. Explain the use of pivot tables in Pandas.
50. How do you save a DataFrame to a CSV file?
NumPy
51. What is NumPy and what are its advantages?
52. How do you create a NumPy array?
53. Explain the difference between a one-dimensional and two-dimensional array.
54. How do you perform array slicing?
55. Explain broadcasting in NumPy.
56. How do you perform mathematical operations on arrays?
57. What is a masked array?
58. How do you handle missing values in NumPy?
59. How do you concatenate arrays?
60. What are the different ways to sort an array?
61. How do you find the mean, median, and standard deviation of a NumPy array?
62. Explain the use of linear algebra functions in NumPy.
63. How do you create identity matrices in NumPy?
64. What are the advantages of using NumPy arrays over Python lists?
65. How do you generate random numbers in NumPy?
All Python Interview Questions For Data Engineer
Matplotlib
66. What is Matplotlib and why is it used?
67. How do you create a basic plot with Matplotlib?
68. Explain the concept of figures and axes in Matplotlib.
69. How do you create multiple plots in one figure?
70. How do you add titles, labels, and legends to plots?
71. Explain how to save a plot to a file.
72. How do you create a scatter plot?
73. What is the difference between a histogram and a bar chart?
74. How do you create a pie chart?
75. Explain how to customize plot styles and colors.
76. How do you add annotations to a plot?
77. What are subplots and how do you create them?
78. How do you create a 3D plot?
79. Explain how to plot time series data.
80. How do you adjust the axis scales?
Basic Python Scenario-Based Questions
1. Debugging a Code Snippet: “You are given a Python script that is supposed to calculate
the sum of the first N natural numbers, but it returns incorrect results. How would you
debug and fix the script?”
2. Optimizing Performance: “Given a list of a million integers, write a Python function to
count how many times each number appears. How would you optimize it for
performance?”
3. Data Transformation: “You have a list of dictionaries where each dictionary represents a
person’s information. Write a Python script to transform this list into a dictionary keyed by
person ID.”
4. File Processing: “How would you process a large log file with Python, extracting and
summarizing error messages, assuming the file is too large to fit into memory?”
OOP Scenario-Based Questions
1. Design a Class System: “Design a simple class system for a library that includes classes
for books, members, and loans. How would you ensure that books can be checked out,
returned, and overdue books tracked?”
2. Extend a Base Class: “Given a base class `Vehicle`, extend it with subclasses `Car`
and `Bike`. Implement method overriding to handle the difference in the number of
wheels.”
3. Solve a Problem Using Encapsulation: “You need to design a class that represents a
user’s bank account. How would you use encapsulation to protect the balance from
unauthorized access?”
4. Implementing Polymorphism: “Design a set of classes for a zoo that includes various
types of animals. Show how polymorphism could be used to implement a method
`make_sound` for each animal type differently.”
All Python Interview Questions For Data Engineer
Pandas Scenario-Based Questions
1. Data Cleaning: “You have a DataFrame containing user data, but some of the users’ ages
are entered as negative numbers. How would you clean this data?”
2. Time Series Analysis: “Given a DataFrame with daily sales data for several products,
write a Pandas script to summarize the data on a monthly basis and identify the best-
selling product each month.”
3. Data Merging: “Explain how you would merge data from two DataFrames, `orders` and
`customers`, where each order has a customer ID, to include the customer’s name in the
order’s DataFrame.”
4. Complex Data Transformation: “You are working with a dataset that records
transactions. Each transaction has a date, category, and amount. You need to produce a
summary report showing the total amount spent per category per month. Describe the
steps you would take using Pandas.”
5. Efficient Data Loading: “You have a very large CSV file that you need to load into a
Pandas DataFrame for analysis, but loading the entire dataset consumes too much
memory. How would you efficiently load and analyze this data?”
6. Handling Time Zones: “Your dataset includes timestamps from different time zones. How
would you standardize these timestamps to a single time zone for analysis?”
NumPy Scenario-Based Questions
1. Matrix Operations: “You have two matrices representing two sets of data. Write a NumPy
script to multiply these matrices and explain the significance of the operation.”
2. Handling Large Datasets: “Given a large dataset represented as a NumPy array, how
would you normalize the data within a range of 0 to 1?”
3. Optimizing Computations: “Explain how you would use NumPy to optimize a
computationally expensive operation that is currently implemented using Python lists and
for-loops.”
4. Memory Management: “Explain how NumPy’s memory management differs from
standard Python lists, particularly when it comes to large datasets.”
Matplotlib Scenario-Based Questions
1. Customized Plot: “You are given a dataset of monthly sales figures for the past year. How
would you visualize the trend over the months, including a moving average line to highlight
the trend?”
2. Interactive Visualization: “Describe how you would create an interactive chart in
Matplotlib that allows the user to highlight different sections of the data for closer
examination.”
3. Comparing Datasets: “Given two datasets representing sales in two different regions,
how would you create a side-by-side bar chart to compare the sales performance across
months?”
4. Advanced Visualization: “You need to create a plot that combines a histogram and a line
chart (overlaying the average line on top of the histogram). How would you approach this
using Matplotlib?”
5. Customizing Axes: “How would you customize the tick marks and labels of a plot to
display dates in a specific format, assuming your data spans several years?”