Data Science: Regression: Part1
Regression is one of the most widely used supervised learning model in the field of data science. It finds its use in the field of Sales forecasting, satisfaction analysis, price estimation, employment income etc.
The types of Regression are as follows:-
1. Simple Regression
a. Simple Linear Regression
b. Simple Non-linear Regression
2. Multiple Regression
a. Multiple linear Regression
b. Multiple non-linear Regression
Pros of using linear Regression model:-
1. Very Fast
2. No Parameter tuning required
3. Easy to understand and highly interpret-able
In order to implement regression we need a data. Suppose our dataset is residing in sample.csv
We need to construct dataframes out of this sample dataset to be able to implement regression. We would be using python libraries "pandas", "scikit-learn" and "numpy" for the purpose
import pandas as pd
import numpy as np
from sklearn import linear_model
# creating data frame from sample.csv
df = pd.read_csv("sample.csv")
#next we extract the fields only which are useful to use in prediction using Regression in another data frame. Suppose the field names are "power consumption" and "motor size". We are building the model to predict power consumption based on motor size
cdf = df[['POWERCONSUMPTION', 'MOTORSIZE']]
#Split the data set into train dataset(80%) and test data set(20%)
msk = np.random.rand(len(df))<0.8
train = cdf[msk]
test = cdf[~msk]
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['powerconsumption']])
train_y= np.asanyarray(train[['motorsize']])
regr.fit(train_x, train_y)
"print(\"Coeffeciantes\", regr.coef_)
"print(\"Intercept\", regr.intercept_)
The types of Regression are as follows:-
1. Simple Regression
a. Simple Linear Regression
b. Simple Non-linear Regression
2. Multiple Regression
a. Multiple linear Regression
b. Multiple non-linear Regression
Pros of using linear Regression model:-
1. Very Fast
2. No Parameter tuning required
3. Easy to understand and highly interpret-able
In order to implement regression we need a data. Suppose our dataset is residing in sample.csv
We need to construct dataframes out of this sample dataset to be able to implement regression. We would be using python libraries "pandas", "scikit-learn" and "numpy" for the purpose
import pandas as pd
import numpy as np
from sklearn import linear_model
# creating data frame from sample.csv
df = pd.read_csv("sample.csv")
#next we extract the fields only which are useful to use in prediction using Regression in another data frame. Suppose the field names are "power consumption" and "motor size". We are building the model to predict power consumption based on motor size
cdf = df[['POWERCONSUMPTION', 'MOTORSIZE']]
#Split the data set into train dataset(80%) and test data set(20%)
msk = np.random.rand(len(df))<0.8
train = cdf[msk]
test = cdf[~msk]
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['powerconsumption']])
train_y= np.asanyarray(train[['motorsize']])
regr.fit(train_x, train_y)
"print(\"Coeffeciantes\", regr.coef_)
"print(\"Intercept\", regr.intercept_)
Comments
Post a Comment