Answer: Data science uses statistical, mathematical, and computational methods to analyze data, extract insights, and aid decision-making.
Â
Answer: AI creates systems that simulate human intelligence, data science uses techniques to analyze and interpret complex data, and statistics focuses on mathematical principles for analyzing and drawing inferences from data.
Â
Answer: Common types include bar charts, line graphs, histograms, scatter plots, pie charts, heat maps, and box plots. Each effectively conveys different insights.
Â
Answer: The p-value indicates evidence against the null hypothesis. A low p-value suggests strong evidence, often leading to rejection of the null hypothesis.
Â
Answer: Feature scaling normalizes or standardizes data, crucial because many algorithms perform better when features are on a similar scale.
Â
Answer: A data analyst collects, processes, and analyzes data to provide insights that guide business decisions, using statistical and visualization tools.
Â
Answer: EDA helps understand data distributions, relationships, and patterns, guiding data cleaning and making informed modeling choices.
Â
Answer: Clustering is an unsupervised method to group similar data points. Applications include customer segmentation, image segmentation, and anomaly detection.
Â
Answer: Parametric methods assume a specific data distribution, while non-parametric methods make no assumptions, making them more flexible.
Â
Answer: Time series analysis studies data points collected over time, used to identify trends, patterns, or seasonal effects.
Â
Answer: Data cleaning corrects errors, inconsistencies, and missing values, enhancing data quality and the accuracy of analysis or model results.
Â
Answer: Regularization adds a penalty term to reduce complexity, controlling overfitting in predictive models.
Â
Answer: The bias-variance tradeoff balances accuracy and complexity. High bias leads to underfitting, while high variance can lead to overfitting.
Â
Answer: Dimensionality reduction decreases the number of features in data, making analysis and visualization easier, especially in large datasets.
Â
Answer: Cross-validation assesses a model’s ability to generalize by training and testing on different data subsets repeatedly.
Â
Answer: Gradient descent is an optimization algorithm that minimizes the loss function by iteratively adjusting model parameters.
Â
Answer: Text mining extracts meaningful information from unstructured text data, used in sentiment analysis, topic modeling, and document clustering.
Â
Answer: A random forest is an ensemble learning method that builds multiple decision trees and combines their outputs for improved accuracy and reduced overfitting.
Â
Answer: Classification categorizes data into classes, while regression predicts continuous values, such as prices or temperatures.
Â
Answer: Techniques include handling missing values, encoding categorical data, scaling features, and normalizing distributions to prepare data for analysis.
Â
Answer: A decision tree splits data into branches based on feature values, with each leaf representing a classification or decision.
Â
Answer: SQL databases are relational and structured using tables, while NoSQL databases handle unstructured data, offering flexibility ideal for large-scale applications.
Â
Answer: SVM is a supervised algorithm that finds the hyperplane that best separates classes, often used for classification tasks.
Â
Answer: A neural network is a computational model inspired by the human brain, consisting of interconnected nodes, used for complex pattern recognition.
Â
Answer: NLP enables machines to understand and interpret human language, with applications in machine translation, sentiment analysis, and chatbots.
Â
Answer: Hyperparameters are model settings not learned from data (e.g., learning rate), significantly impacting model performance.
Â
Answer: Metrics include accuracy, precision, recall, F1 score, and AUC-ROC, each offering insights into model performance.
Â
Answer: Reinforcement learning is a learning approach where an agent learns by maximizing rewards through trial and error in an environment.
Â
Answer: Feature scaling normalizes or standardizes features, helping algorithms perform better by aligning feature scales.
Â
Answer: Overfitting happens when a model learns noise. It can be prevented with regularization, cross-validation, and simpler models.
Â