{"id":33228,"date":"2022-02-08T16:44:02","date_gmt":"2022-02-08T15:44:02","guid":{"rendered":"http:\/\/54.194.80.134.nip.io\/?p=33228"},"modified":"2022-02-09T15:49:38","modified_gmt":"2022-02-09T14:49:38","slug":"advanced-analytics-with-r","status":"publish","type":"post","link":"https:\/\/www.cubeserv.com\/en\/advanced-analytics-with-r\/","title":{"rendered":"Advanced Analytics with R: An Overview"},"content":{"rendered":"\t\t
While the R programming language has been around since the early 90s, it has received a lot of fame and attention in the previous decade, mainly due to its vast range of functionalities related to statistical analysis and data science. A significant reason is that it doesn’t require a solid programming background for people to start using it.<\/p>
Continuing our series of analytics with R, today we’re going to explore advanced analytics with R. It will include topics like Regression Analysis with R and Time Series Forecast with R. If you want to check out the previous article based upon beginners’ level analytics, feel free to click\u00a0here<\/a>.<\/p> So, let\u2019s start without any further ado.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t Starting with the basics, let’s see all the different kinds of plots we can make in R.\u00a0 While plotting graphs is a relatively simple job and one might argue that it doesn’t qualify for advanced analytics, it’s essential to know the different kinds of plots available and when to use one according to the scenario. The outcomes they can provide in a few lines of code are sometimes more\u00a0 meaningful than the advanced analytics themselves.<\/p> No matter what kind of plots you\u2019re looking to make in R, ggplot2 should always be your first choice. It\u2019s by far the most used package by R-programmers when plotting something.<\/p> Let’s look at the different plots provided by the ggplot2 package and see for which applications they are suitable for. For demonstration purposes we will be using the famous Iris dataset<\/a>.<\/p> So, let\u2019s fire up RStudio and start plotting!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t Bar graphs are the most mainstream kind of graphs used in analysis. They\u2019re used whenever you want to compare the values of different categories using vertical bars representing the values. These bars of varying height make the comparison very convenient. Here\u2019s an example viewing the sepal length of different species.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Histograms are very similar to bar plots. They are used to graphically view the continuous data and group them into bins. Each bar in a histogram has multiple bins with different colors which makes it easy to see the frequency of each individual category. Here\u2019s how we can make them using ggplot2<\/em>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t A box plot visualizes the overall data distribution in a very compact manner. With a single box, you can view both the upper and lower quartiles and any outliers present, along with the range of data spread.<\/p> Interested in how to read a box plot? Click\u00a0here<\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Last but not least, scatter plots are also very common and a useful way of viewing data. They\u2019re widely used by data scientists to view any present correlation between a set of variables. They simply scatter all the points of a variable on a chart and if there\u2019s any correlation between them, it becomes evident.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Regression analysis refers to statistical processing where the relationship between the variables in a dataset are identified. We are mostly making out the relationship between the independent and dependent variables, but it doesn’t always have to be the case. This is another important function to do Advanced Analytics with R.<\/p> The idea of regression analysis is to help us to know how the other variable will change if we change one variable. This is precisely how regression models are built. There are different types of regression techniques we can use based on the shape of the regression line and the types of variables involved:<\/p> Let\u2019s have a closer look at what the different regression types are used for.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t This is the most basic type of regression and can be used where two variables have a linear relationship. Based on the values of the two variables, a straight line is modeled with the following equation:<\/p> Y = ax + b<\/em><\/p> Linear regression is used to predict continuous values where you just supply the value of the independent variable. You get the value of the dependent variable (y in this case) as a result.<\/p> Logistic regression is the following regression technique used to predict values within a specific range. It can be used when the target variable is categorical, for example, predicting the winner or loser using some data. The following equation is used in logistic regression.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t As the name suggests, multinomial logistic regression is an advanced version of logistic regression. The difference between this and simple logistic regression is that it can support more than two categorical variables. Other than that, it uses the same mechanism as logistic regression.<\/p>\n This as well is an advanced mechanism to the simple logistic regression, and it’s used to predict the values that exist on different category levels, for example, predicting the ranks. An example application of using ordinal logistic regression would be rating your experience at a restaurant.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Now, let’s see how we can do regression analysis in R. For demonstration, I would be creating a logistic regression model in R since it covers the concepts nicely.<\/p> Use Case: We will be predicting students\u2019 success in an exam using their IQ levels.<\/p> Let\u2019s generate some random IQ numbers to come up with our dataset.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Using the rnorm()<\/em>, we have created a list of 40 IQ values that have a mean of 30 and a STD of 2.<\/p> Now, we randomly created pass\/fail values as 0\/1 for 40 students and put them in a dataframe. Also, we will associate each value we create with an IQ so our dataframe is complete.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Now, let\u2019s create a regression model based on our dataset and create a curve to see how the regression model performs on it. We can use the glm()<\/em> function to create and train a regression model and the curve() <\/em>method to plot the curve based on prediction.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Moreover, if you want to check the logistic regression model’s statistics further, you can do so by running the summary() <\/em>of R (summary(g)<\/em>).\u00a0<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t Time Series Forecast is amongst the strongest suits of R. While Python is also quite famous for time series analysis, many experts still argue that R provides you with an overall better experience. The Forecast<\/em> package is very comprehensive, and the best one could wish for Advanced Analytics with R.<\/p> In this article, we will be covering the following methods of Time Series Forecasting:<\/p> We’ll use the Air Passengers dataset present in R to create models on a validation set, forecast as far as the duration for the validation set goes, and finally obtain the Mean Absolute Percentage Error<\/a> to complete the segment.<\/p> So, let\u2019s initialize the data along with the training and validation window to get started.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t As the name suggests, the na\u00efve method is the simplest of all forecasting methods. It is based on the simple principle of “what we observe today, will be the forecast tomorrow.” Seasonal na\u00efve method is a bit complex variant where the observation period is according to the horizon we’re working with, e.g., week\/month\/year.<\/p> Let\u2019s move forward with a seasonal na\u00efve forecast.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
Graph Plotting in R<\/h2>
ggplot2 \u2013 Your Best Friend!<\/h3>
\n\t\t\t\t
\n\t\t\t\t\t
1.\u00a0\u00a0\u00a0\u00a0\u00a0 Bar Graphs<\/h3>
\n\t\t\t\t
\n\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t
2.\u00a0\u00a0\u00a0\u00a0\u00a0 Histograms<\/h3>
\n\t\t\t\t
\n\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t
3.\u00a0\u00a0\u00a0\u00a0\u00a0 Box Plots<\/h3>
\n\t\t\t\t
\n\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t
4.\u00a0\u00a0\u00a0\u00a0\u00a0 Scatter Plots<\/h3>
\n\t\t\t\t
\n\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
Regression Analysis In R<\/h2>
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
1.\u00a0\u00a0\u00a0\u00a0\u00a0 Linear Regression<\/h3>
2.\u00a0\u00a0\u00a0\u00a0\u00a0 Logistic Regression<\/h3>
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t
3. Multinomial Logistic Regression<\/h3>\n
4. Ordinal Logistic Regression<\/h3>\n
Using Regression in R<\/h2>
\n\t\t\t\t
\n\t\t\t\t\t
\n\t\t\t\t
\n\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t
\n\t\t\t\t
\n\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
Time Series Forecast In R<\/h2>
\n\t\t\t\t
\n\t\t\t\t\t
1.\u00a0\u00a0\u00a0 <\/strong>Na\u00efve Methods<\/strong><\/h3>
\n\t\t\t\t
\n\t\t\t\t\t