Getting Started with Plexflo's datastream

Open-source Python wrapper around Plexflo's apps, APIs, and algorithms that help researchers, engineers and energy companies build intelligence, apps, and dashboards to accelerate clean-tech adoption across all communities.

datastream

datastream helps researchers and engineers to try Plexflo's Deep Learning models for detection of Electric Vehicles charging events from smart home meter data.

First things first, what exactly is the use case of detecting EV charging events?

  • Helps grid utilities estimate and manage grid loads in real-time when there is a surge in EV charging events
  • Build forecasting models and generate analytical insights like
    • Average duration of EV charging events
    • What is most probable time of EVs being charged in a geographical area
  • Helps consumer with a scope to build an analytical system for energy profiling, smart alerts, and much more..

datastream enters the chat :stuck-out-tongue-winking-eye:

It is very simple to use datastream for generating predictions given smart home meter data. But, before we get started let's understand how does smart home meter data looks like.

232232

We have timestamp and grid as columns in the csv/excel file. Our library is flexible enough to take either csv or excel file formats.

Installing datastream library

It is very easy to install our library using pip package installer.

pip install plexflo

πŸ“˜

datastream on MacOS

We are currently working on providing an installable package for MacOS. As an alternative for pip installation, you can use our docker image and follow a few simple steps to get started on a Mac.

datastream on DockerHub

We also have a docker image for those who just want to straightaway try out our library with some test files.

docker pull sayonsync/plexflo
docker run --rm -it sayonsync/plexflo bash

Once you enter the container, just run "python run.py" for working with a sample csv file. If you want to work with your own files, it's as simple as mounting your local directory as shown below.

docker run --rm -it -v path/to/local/dataset/folder:/usr/src/app/folder sayonsync/plexflo bash

πŸ“˜

Using the docker image effectively

The best way of using our docker image is to mount a local directory with the -v flag, enabling easy access to the prediction files or the re-trained model files.

Time to initialize the Deep Learning model!

# Import statement
from plexflo.datastream.model import Model

# This will initialize the ML model
model = Model()

# This will load a TensorFlow model which is capable of generating predictions every 15 minutes (~900s).

Loading data into a pandas dataframe

Now, let's load a sample dataset using pandas.

import pandas as pd

# Loading the test data
data = pd.read_csv("sample.csv")
data.head()

πŸ“˜

Sample csv file

Here is a sample file to get you started with generating predictions

🚧

Columns in test data

Make sure to have the grid column compulsorily in the data file, otherwise an exception will be thrown :grimacing: The grid column should contain the kWh power consumption values preferably sampled at 1/60 Hz.

Generating predictions

Now that we have the data, let's send it to the model and generate predictions. The "predict" function can take 3 parameters in total and 2 of them are mandatory - Model object and the dataframe.

predict function generates a csv file containing the columns present in the dataframe with an additional column named EV. This column has categorical type data - 0 or 1, which denotes whether an EV was charging at that particular timestamp or not.

The 3rd (optional) parameter out_fname takes the name of output file to be generated. Note: full file name with extension to be provided.

❗️

Data for prediction

Make sure to have atleast 900 data points to generate the predictions. This is due to the sequence length configured for the Deep Learning model. If you see blank values in the prediction column of the generated file, that is because for those number of rows, we couldn't generate a prediction (sequence < 900).

from plexflo.datastream.inference import predict

pred_data = predict(model, 
                    data)

pred_data = predict(model, 
                    data, 
                    out_fname = "new_predictions.csv")

There is also a functionality of giving just the file path to a function predict_from_file, from which data will be automatically loaded into a dataframe.

from plexflo.datastream.inference import predict_from_file

pred_data = predict_from_file(model, 
                              "test.csv")

pred_data = predict_from_file(model,
                              "test.csv",
                              out_fname = "new_predictions.csv")

πŸ“˜

Predictions

predict and predict_from_file functions by default return the dataframe with predictions and also export a csv file to the following directory: current_dir/output/files. Default file name for the generated csv is prediction_15_min.csv

Fine-tuning/Transfer learning the model on custom data

There is an option to load our model, use your data to fine-tune/transfer learn the model for further usage as per your needs. There are a few basic hyper-parameters that can be tuned for your training experiments. We will be providing more flexibility and ease-of-usage improvements for fine-tuning/transfer learning on our model on your custom data.

from plexflo.datastream.finetune import finetune_15min_model

new_model = finetune_15min_model(model, 
                             data, 
                             shift = 5, 
                             batchsize = 64, 
                             epochs = 5,
                             val_split = 0.25,
                             model_name = "new_model.h5")

The data that is going to be used for re-training must have two columns - grid and ground_truth. The ground_truth column should contain binary value (0/1) for each timestamp. The below image describes the contents from a sample training file.

327327

finetune_15min_model function can take 7 parameters in total out of which the first 5 are mandatory - Model object, input dataframe, shift, batchsize and epochs.

shift parameter provides the flexibility of shifting the sequence window by the given integer number. Lesser the shift number better is the performance, but the trade-off is longer training times.

The 6th (optional) parameter val_split takes a float number which denotes the percentage of data to be used to validation.

The 7th (optional) parameter model_name takes the name of output model to be saved while/after training. Note: full file name with extension to be provided.

πŸ“˜

Fine-tuning/Transfer learning

finetune_15min_model function by default return the fine-tuned model and also exports/saves the model following directory: current_dir/output/models.

The function also exports accuracy and loss graphs to this directory: current_dir/output/graphs.

Default file name for the generated csv is model_15_min.h5

How to use your fine-tuned model?

We have made provisions inside the Model class to load a fine-tuned model. Just pass the name of the model as a constructor parameter while initializing the Model class. Easy and hassle-free.

from plexflo.datastream.model import Model

new_model = Model("new_model.h5")

🚧

Loading fine-tuned model

Make sure that the model you want to load after fine-tuning/transfer learning process, is present under this directory: current_dir/output/models

Stream data to a server

There is an option to stream values from a dataframe using our stream function. It takes four mandatory parameters - dataframe, host, port and, interval. The function will attempt to connect to the specified HOST_NAME and PORT and start streaming data from "kW" column at the interval (in seconds) defined.

import pandas as pd
from plexflo.datastream.stream import stream

df = pd.read_csv("Test.csv")

stream(data = df, 
       host = "HOST_NAME", 
       port = PORT, 
       interval = 1)

Did this page help you?