Data Science: Regression: Part1

Regression is one of the most widely used supervised learning model in the field of data science. It finds its use in the field of Sales forecasting, satisfaction analysis, price estimation, employment income etc.

The types of Regression are as follows:-

1. Simple Regression
    a. Simple Linear Regression
    b. Simple Non-linear Regression

2. Multiple Regression
    a. Multiple linear Regression
    b. Multiple non-linear Regression


Pros of using linear Regression model:-

1. Very Fast
2. No Parameter tuning required
3. Easy to understand and highly interpret-able

In order to implement regression we need a data. Suppose our dataset is residing in sample.csv
We need to construct dataframes out of this sample dataset to be able to implement regression. We would be using python libraries "pandas", "scikit-learn" and "numpy" for the purpose

import pandas as pd
import numpy as np
from sklearn import linear_model

# creating data frame from sample.csv

df = pd.read_csv("sample.csv")

#next we extract the fields only which are useful to use in prediction using Regression in another data frame. Suppose the field names are "power consumption" and "motor size". We are building the model to predict power consumption based on motor size

cdf = df[['POWERCONSUMPTION', 'MOTORSIZE']]

#Split the data set into train dataset(80%) and test data set(20%)

msk = np.random.rand(len(df))<0.8
train = cdf[msk]
test = cdf[~msk]
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['powerconsumption']])
train_y= np.asanyarray(train[['motorsize']])
regr.fit(train_x, train_y)
"print(\"Coeffeciantes\", regr.coef_)
"print(\"Intercept\", regr.intercept_)






Comments

Popular posts from this blog

python3: unpickling error

Azure Data Analytics: Part1: Hosting Data Lake storage: Gen1 and Gen2