91. Differentiate between Library() and Require()
Library()- If the desired package cannot be loaded, this function will display an error message. It loads the package whether it is already loaded or not.
Require()- When a particular package is not found, it gives warning messages. Require() is used inside a function. It checks whether it is loaded or not and loads if it is not loaded
92. Why is clustering required in data analysis?
Clustering refers to the group of objects that belongs to the same class. It is a process to make a group of abstract objects into the class of similar objects. Clustering is required in data analysis due to the following reasons-
- Scalability– clustering is required to deal with large databases.
- Interpretability– the result of clustering should be comprehensive and usable.
- Dimensionality- the clustering algorithm is used to handle high-dimensional space.
- Deal with noisy data– Databases contains erroneous data. Algorithms that are sensitive to such data may deliver poor results.
93. What is the rattle package in R?
Rattle gives statistical and visual summaries of data and is a popular GUI for data mining. It transforms data so it can be easily modeled and builds a supervised and unsupervised ML model from the data. It also gives the graphical presentation of the models. The rattle is also used as a teaching facility to learn R languages. The features of Rattle package include clustering, modeling, evaluation, statistical test, etc.
94. What is the Random Walk Model in R?
In R programming, random walk model is an example of the non-stationary model. A random walk has no fixed mean or variance. It also has a strong dependence over time. There are two types of random walks namely random walk without drift and random walk with drift.
95. Explain the concept of Principal Component Analysis.
Under the Principal Component Analysis, the data is transformed into a new space. The first principal component takes the maximum amount of variance from the original data. The second principal component captures the amount of variability left. This is true for each component element and they are all uncorrelated. In R programming, Principal Component Analysis can be done using the function prcomp().
96. What Is The Use Of Lattice Package?
Lattice package is to improve on-base R graphics by giving better defaults and it has the ability to easily display multivariate relationships.
97. List the functions available in “dplyr” package.
- Filter- Filter() allows you to select a subset of rows in a data frame. the first argument is the tibble and the second argument is the variables within that data frame. it selects the rows where expression is true.
- Arrange- Arrange() recorders the rows on the basis of data frames or a set of column names. Desc() function is used to arrange columns in descending orders.
- Mutate- it is used to add new variables to the data. it is also used to add new columns that are functions of existing columns. Dplyr::mutate is used to refer to the newly created column.
- Select()- this function is used to zoom in on a useful subset that works on numeric values. With select(), you can use functions like ends_with(), matches(), starts_with(), etc.
98. What is White Noise model in R?
In R, a white noise model is a basic time series model which is also the basis for more elaborated and defined models. To stimulate the data from a variety of tie series model, Arima.sim() function is used. The white noise model has a fixed constant mean, fixed constant variance and no correlation over time.
99. What is Rmarkdown? What is the use of it?
RMarkdown is a reporting tool provided by R. With the help of Rmarkdown, you can create high-quality reports of your R code.
The output format of Rmarkdown can be:
- HTML
- WORD
100. What packages are used for data mining in R?
Some packages used for data mining in R:
- data.table- provides the fast reading of large files
- rpart and caret- for machine learning models.
- Arules- for association rule learning.
- ggplot- provides various data visualization plots.
- tm- to perform text mining.
- Forecast- provides functions for time series analysis