Hello everyone! This jupyter notebook is a very simple python guide for loading and analysing the data for the hackathon.¶

If you are already experienced in Python or if you are planning to use a different language, you can go ahead and skip this one. No hard feelings!

If not, don't worry. This looks more complicated than it is. By the end of this notebook you'll see that loading and manipulating data can be as easy as snapping your fingers.

Well... almost as easy.

First things first!¶

The first thing you should do is to import the libraries that you'll need for your project.

The ones we present below are just a suggestion. Feel free to use more, use less or don't use any at all!

FYI, Numpy is great for numerical analysis and pandas was designed to help you deal with data structures. Matplotlib is the standard plotting library.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Loading the Data¶

If you want to load the files into a manageable tabular format (pandas dataframe) you should use something along these lines.

#Simple load - the sep parameter defines the character that separates (makes sense right?) the different columns in the table.

df = pd.read_csv('C:\\Users\\E348493\\Desktop\\Hack_the_electron\\load_pwr.csv',sep=';')

#imagine that your data has some weird characters like Ç,´,^,. To load something like this you you need to define a 
#different encoding

df_2 = pd.read_csv('C:\\Users\\E348493\\Desktop\\Hack_the_electron\\dataset_index.csv',sep=';', encoding="latin1")

In this case, I chose to place the files in a folder on my desktop called "Hack_the_electron". You can place the files wherever you want, just remember to adapt the path in the commands.

Looking at the data¶

To show a small sample of the data in the dataframe, the head() command is always useful (here we drop one of the columns with the command drop() to ease the analysis)

df=df.drop(columns="Unnamed: 0")

df.head()

Let's take this opportunity and look into the other dataset as well.

df_2.head()

Data Snapshot¶

To get the overall look of the data, the describe() and info() commands are great tools

df.info()

df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70072 entries, 0 to 70071
Columns: 217 entries, Time to meter_215
dtypes: float64(216), object(1)
memory usage: 116.0+ MB

Time to get our hands dirty!¶

Let's select some of the columns and perform a couple of simple operations

First, we'll calculate a simple sum

#Select first five elements in the meter_0 column and place them in a list
sample=df[["meter_0"]][0:5].values

print(sample)

#Loop through the elements of the list and sum their values if they are larger than 70
total_sum=sum(i for i in sample if i>700)
    

print("Sum=",total_sum)

[[1076.]
 [1064.]
 [1036.]
 [1504.]
 [3236.]]
Sum= [7916.]

In the document that specifies the data for the challenge (Hack the Electron - Stream A - All Files and Variables.pdf), we mention that we only have the meters' power readings (in W). If we want to obtain the corresponding consumed energy in kWh we need to divide the power by 4 (because the measurements are quarter hour averages). Let's see how we can create a new column with the energy consumptinon in kWh

df_meter_0=df[['Time','meter_0']].copy()

df_meter_0["energy_consumption"]=df_meter_0["meter_0"]/4

df_meter_0.head()

Convert Datatypes¶

If you want to create a datetime column from a column with a different type you can use the following:

df["time_datetime"]=pd.to_datetime(df.Time)

print("Type of Time column: ",df[["Time"]].dtypes)

print("Type of time_date column: ",df[["time_datetime"]].dtypes)

Type of Time column:  Time    object
dtype: object
Type of time_date column:  time_datetime    datetime64[ns]
dtype: object

Sorting the data¶

If you would like to sort the data by a particular column, you can use this simple command below

df=df.sort_values("time_datetime")

df.head()

Time to Plot!¶

What can we see if we plot the data from one of the meters? For readability, we will only plot the first 48 values. (Note that in the plot below, the time axis has the following format Day-Month-Hour)

x=df[["time_datetime"]][:48].values

y=df[["meter_0"]][:48].values

plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x124b6a58>]

Congratulations! You have reached the end of this very small guide.¶

Now is the time to move fast and break things (as they used to say in Facebook). Just remember to put things back together in the end!

If you would like to know more about Python check out these references:

https://www.w3schools.com/python/

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/

	Time	meter_0	meter_1	meter_2	meter_3	meter_4	meter_5	meter_6	meter_7	meter_8	...	meter_207	meter_208	meter_209	meter_210	meter_211	meter_212	meter_213	meter_214	meter_215
0	2016-10-01 00:15:00	1076.0	404.0	76.0	148.0	284.0	220.0	332.0	144.0	1888.0	...	264.0	604.0	316.0	112.0	208.0	1044.0	76.0	1320.0	124.0
1	2016-10-01 00:30:00	1064.0	440.0	56.0	116.0	236.0	164.0	328.0	216.0	1828.0	...	272.0	536.0	248.0	64.0	188.0	1068.0	40.0	1316.0	184.0
2	2016-10-01 00:45:00	1036.0	484.0	136.0	20.0	236.0	168.0	356.0	100.0	1696.0	...	268.0	660.0	192.0	136.0	144.0	1164.0	44.0	1284.0	212.0
3	2016-10-01 01:00:00	1504.0	580.0	56.0	84.0	196.0	164.0	364.0	36.0	1692.0	...	272.0	664.0	176.0	68.0	144.0	1016.0	40.0	1380.0	156.0
4	2016-10-01 01:15:00	3236.0	488.0	68.0	252.0	164.0	84.0	440.0	52.0	1832.0	...	276.0	648.0	152.0	104.0	212.0	1016.0	40.0	1500.0	124.0

	meter_id	tariff	n_phases	ctrct_pw
0	meter_0	Time-Of-Use Two Periods	T	20.70
1	meter_1	Time-Of-Use Two Periods	T	20.70
2	meter_2	Simple Tariff	M	4.60
3	meter_3	Simple Tariff	M	3.45
4	meter_4	Simple Tariff	T	6.90

	meter_0	meter_1	meter_2	meter_3	meter_4	meter_5	meter_6	meter_7	meter_8	meter_9	...	meter_206	meter_207	meter_208	meter_209	meter_210	meter_211	meter_212	meter_213	meter_214	meter_215
count	70064.000000	70062.000000	70061.000000	70046.000000	70052.000000	70066.000000	70070.000000	70065.000000	70066.000000	70027.000000	...	70009.000000	70060.000000	70057.000000	70053.000000	70061.000000	70044.000000	70066.000000	70069.000000	70050.000000	70052.000000
mean	1688.141342	848.662984	283.509913	338.758359	294.952321	238.771730	317.813387	252.258189	1092.979705	966.943036	...	158.428488	254.022723	978.715146	276.105178	205.012832	339.919194	1071.249964	239.781815	1124.807252	271.566779
std	1537.332938	638.520281	506.551761	397.158476	384.147312	457.623944	310.270384	616.990333	532.496515	563.358573	...	306.613934	113.423875	1479.184981	198.540920	392.445709	454.174532	781.267779	505.232360	302.610938	292.046241
min	0.000000	0.000000	16.000000	12.000000	0.000000	0.000000	0.000000	0.000000	500.000000	144.000000	...	0.000000	32.000000	0.000000	36.000000	36.000000	0.000000	112.000000	0.000000	40.000000	0.000000
25%	340.000000	492.000000	64.000000	112.000000	116.000000	28.000000	156.000000	48.000000	716.000000	688.000000	...	0.000000	204.000000	408.000000	148.000000	88.000000	112.000000	524.000000	60.000000	988.000000	108.000000
50%	1168.000000	640.000000	120.000000	184.000000	192.000000	88.000000	224.000000	104.000000	972.000000	800.000000	...	0.000000	212.000000	480.000000	252.000000	116.000000	168.000000	744.000000	100.000000	1176.000000	176.000000
75%	2628.000000	876.000000	212.000000	336.000000	304.000000	148.000000	372.000000	180.000000	1324.000000	968.000000	...	192.000000	308.000000	836.000000	348.000000	148.000000	336.000000	1576.000000	148.000000	1320.000000	304.000000
max	12284.000000	7576.000000	5436.000000	4168.000000	4744.000000	2348.000000	3964.000000	8780.000000	6076.000000	8460.000000	...	3724.000000	1560.000000	17084.000000	3232.000000	5368.000000	3980.000000	7768.000000	4784.000000	2512.000000	3248.000000

	Time	meter_0	energy_consumption
0	2016-10-01 00:15:00	1076.0	269.0
1	2016-10-01 00:30:00	1064.0	266.0
2	2016-10-01 00:45:00	1036.0	259.0
3	2016-10-01 01:00:00	1504.0	376.0
4	2016-10-01 01:15:00	3236.0	809.0

	Time	meter_0	meter_1	meter_2	meter_3	meter_4	meter_5	meter_6	meter_7	meter_8	...	meter_207	meter_208	meter_209	meter_210	meter_211	meter_212	meter_213	meter_214	meter_215	time_datetime
0	2016-10-01 00:15:00	1076.0	404.0	76.0	148.0	284.0	220.0	332.0	144.0	1888.0	...	264.0	604.0	316.0	112.0	208.0	1044.0	76.0	1320.0	124.0	2016-10-01 00:15:00
1	2016-10-01 00:30:00	1064.0	440.0	56.0	116.0	236.0	164.0	328.0	216.0	1828.0	...	272.0	536.0	248.0	64.0	188.0	1068.0	40.0	1316.0	184.0	2016-10-01 00:30:00
2	2016-10-01 00:45:00	1036.0	484.0	136.0	20.0	236.0	168.0	356.0	100.0	1696.0	...	268.0	660.0	192.0	136.0	144.0	1164.0	44.0	1284.0	212.0	2016-10-01 00:45:00
3	2016-10-01 01:00:00	1504.0	580.0	56.0	84.0	196.0	164.0	364.0	36.0	1692.0	...	272.0	664.0	176.0	68.0	144.0	1016.0	40.0	1380.0	156.0	2016-10-01 01:00:00
4	2016-10-01 01:15:00	3236.0	488.0	68.0	252.0	164.0	84.0	440.0	52.0	1832.0	...	276.0	648.0	152.0	104.0	212.0	1016.0	40.0	1500.0	124.0	2016-10-01 01:15:00