Wednesday, July 27, 2016

76 Data Science Interview Questions


https://www.dezyre.com/article/100-data-science-interview-questions-and-answers-general-for-2016/184



1) How would you create a taxonomy to identify key customer trends in unstructured data?

2)         Python or R – Which one would you prefer for text analytics?

3)         Which technique is used to predict categorical responses?

4)         What is logistic regression? Or State an example when you have used logistic regression recently.

5)         What are Recommender Systems?

6)         Why data cleaning plays a vital role in analysis?

7)         Differentiate between univariate, bivariate and multivariate analysis.

8)         What do you understand by the term Normal Distribution?

9)         What is Linear Regression?

10)       What is Interpolation and Extrapolation?

11)       What is power analysis?

12)      What is K-means? How can you select K for K-means?

13)       What is Collaborative filtering?

14)       What is the difference between Cluster and Systematic Sampling?

15)       Are expected value and mean value different?

16)       What does P-value signify about the statistical data?

17)  Do gradient descent methods always converge to same point?

18)  What are categorical variables?

19)       A test has a true positive rate of 100% and false positive rate of 5%. There is a population with a

1/1000 rate of having the condition the test identifies. Considering a positive test, what is the probability of

having that condition?

20)       How you can make data normal using Box-Cox transformation?

21)       What is the difference between Supervised Learning an Unsupervised Learning?

22) Explain the use of Combinatorics in data science.

23) Why is vectorization considered a powerful method for optimizing numerical code?

24) What is the goal of A/B Testing?

25)       What is an Eigenvalue and Eigenvector?

26)       What is Gradient Descent?

27)       How can outlier values be treated?

1) To change the value and bring in within a range

2) To just remove the value.

28)       How can you assess a good logistic model?


29)       What are various steps involved in an analytics project?

30) How can you iterate over a list and also retrieve element indices at the same time?

31)       During analysis, how do you treat missing values?

33)       Can you use machine learning for time series analysis?


34)       Write a function that takes in two sorted lists and outputs a sorted list that is their union.

35)       What is the difference between Bayesian Inference and Maximum Likelihood Estimation (MLE)?

36)       What is Regularization and what kind of problems does regularization solve?

37)       What is multicollinearity and how you can overcome it?

38)        What is the curse of dimensionality?

39)        How do you decide whether your linear regression model fits the data?

40)       What is the difference between squared error and absolute error?

41)       What is Machine Learning?

42) How are confidence intervals constructed and how will you interpret them?

43) How will you explain logistic regression to an economist, physican scientist and biologist?

44) How can you overcome Overfitting?

45) Differentiate between wide and tall data formats?

46) Is Naïve Bayes bad? If yes, under what aspects.

47) How would you develop a model to identify plagiarism?

48) How will you define the number of clusters in a clustering algorithm?

49) Is it better to have too many false negatives or too many false positives?

50) Is it possible to perform logistic regression with Microsoft Excel?

51)  What do you understand by Fuzzy merging ? Which language will you use to handle it?

52) What is the difference between skewed and uniform distribution?

53) You created a predictive model of a quantitative outcome variable using multiple regressions. What are the

steps you would follow to validate the model?

54) What do you understand by Hypothesis in the content of Machine Learning?

55) What do you understand by Recall and Precision?

56) How will you find the right K for K-means?

57) Why L1 regularizations causes parameter sparsity whereas L2 regularization does not?

58) How can you deal with different types of seasonality in time series modelling?

59) In experimental design, is it necessary to do randomization? If yes, why?

60) What do you understand by conjugate-prior with respect to Naïve Bayes?

61) Can you cite some examples where a false positive is important than a false negative?

62) Can you cite some examples where a false negative important than a false positive?

63) Can you cite some examples where both false positive and false negatives are equally important?

64) Can you explain the difference between a Test Set and a Validation Set?


65) What makes a dataset gold standard?

66) What do you understand by statistical power of sensitivity and how do you calculate it?

67) What is the importance of having a selection bias?

68) Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa.

SVM and Random Forest are both used in classification problems.


69) What do you understand by feature vectors?

70) How do data management procedures like missing data handling make selection bias worse?

71) What are the advantages and disadvantages of using regularization methods like Ridge Regression?

72) What do you understand by long and wide data formats?

73) What do you understand by outliers and inliers? What would you do if you find them in your dataset?

74) Write a program in Python which takes input as the diameter of a coin and weight of the coin and produces

output as the money value of the coin.

75) What are the basic assumptions to be made for linear regression?

76) Can you write the formula to calculat R-square?


No comments:

Post a Comment