Adapters
This notebook covers the following topics:
- Converting
datasets.Datasetinto other popular time series data formats.
Dataset adapters¶
Unfortunately, different time series forecasting libraries use very different data formats.
Luckily, fev comes with various adapters that make it easy to convert the data associated with each Task into an appropriate format for the different libraries.
import fev
# Define a task with a mix of static & dynamic features
task = fev.Task(
dataset_path="autogluon/chronos_datasets",
dataset_config="monash_rideshare",
horizon=30,
target="price_mean",
past_dynamic_columns=["distance_mean", "surge_mean"],
known_dynamic_columns=["api_calls", "temp", "rain", "humidity", "clouds", "wind"],
static_columns=["source_location", "provider_name", "provider_service"],
)
By default, window.get_input_data() returns two datasets.Dataset objects:
past_datacontains all past data including target, timestamps, and covariatesfuture_datacontains future values of timestamps and known covariates
window = task.get_window(0)
past_data, future_data = window.get_input_data()
print(past_data)
print(future_data)
Dataset({
features: ['id', 'timestamp', 'price_mean', 'api_calls', 'clouds', 'humidity', 'rain', 'temp', 'wind', 'distance_mean', 'surge_mean', 'provider_name', 'provider_service', 'source_location'],
num_rows: 156
})
Dataset({
features: ['id', 'timestamp', 'api_calls', 'clouds', 'humidity', 'rain', 'temp', 'wind', 'provider_name', 'provider_service', 'source_location'],
num_rows: 156
})
You can use the fev.convert_input_data() method to convert the past & future data into formats expected by other frameworks.
Pandas¶
from IPython.display import display
train_df, future_df, static_df = fev.convert_input_data(window, adapter="pandas")
print("train_df")
display(train_df.head())
print("future_df")
display(future_df.head())
print("static_df")
display(static_df.head())
train_df
| id | timestamp | price_mean | api_calls | clouds | humidity | rain | temp | wind | distance_mean | surge_mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | T000000 | 2018-11-26 06:00:00 | 16.555555 | 9.0 | 0.990667 | 0.913333 | 0.0 | 40.627335 | 1.350667 | 1.726667 | 1.055556 |
| 1 | T000000 | 2018-11-26 07:00:00 | 17.299999 | 10.0 | 0.970000 | 0.920000 | 0.0 | 41.137501 | 1.735000 | 1.690000 | 1.100000 |
| 2 | T000000 | 2018-11-26 08:00:00 | 13.500000 | 1.0 | 0.980000 | 0.923333 | 0.0 | 40.919998 | 1.330000 | 1.380000 | 1.000000 |
| 3 | T000000 | 2018-11-26 09:00:00 | 17.954546 | 11.0 | 1.000000 | 0.927500 | 0.0 | 40.937500 | 1.365000 | 1.920909 | 1.113636 |
| 4 | T000000 | 2018-11-26 10:00:00 | 18.625000 | 12.0 | 0.995000 | 0.940000 | 0.0 | 40.695000 | 1.895000 | 2.122500 | 1.083333 |
future_df
| id | timestamp | api_calls | clouds | humidity | rain | temp | wind | |
|---|---|---|---|---|---|---|---|---|
| 0 | T000000 | 2018-12-17 13:00:00 | 10.0 | 0.97 | 0.90 | 0.0 | 35.169998 | 7.22 |
| 1 | T000000 | 2018-12-17 14:00:00 | 7.0 | 0.92 | 0.90 | 0.0 | 36.299999 | 6.87 |
| 2 | T000000 | 2018-12-17 15:00:00 | 13.0 | 0.88 | 0.87 | 0.0 | 37.250000 | 7.58 |
| 3 | T000000 | 2018-12-17 16:00:00 | 12.0 | 1.00 | 0.84 | 0.0 | 39.000000 | 6.28 |
| 4 | T000000 | 2018-12-17 17:00:00 | 9.0 | 0.95 | 0.81 | 0.0 | 40.009998 | 6.46 |
static_df
| id | provider_name | provider_service | source_location | |
|---|---|---|---|---|
| 0 | T000000 | Lyft | Lux | Back Bay |
| 1 | T000001 | Lyft | Lux Black | Back Bay |
| 2 | T000002 | Lyft | Lux Black XL | Back Bay |
| 3 | T000003 | Lyft | Lyft | Back Bay |
| 4 | T000004 | Lyft | Lyft XL | Back Bay |
GluonTS¶
Data is stored in a PandasDataset.
The train_dataset contains only the historic data; the prediction_dataset additionally contains future values of the dynamic features.
train_dataset, prediction_dataset = fev.convert_input_data(window, adapter="gluonts")
print("train_dataset")
print(train_dataset)
print("prediction_dataset")
print(prediction_dataset)
train_dataset PandasDataset<size=156, freq=h, num_feat_dynamic_real=6, num_past_feat_dynamic_real=2, num_feat_static_real=0, num_feat_static_cat=3, static_cardinalities=[ 2. 13. 12.]> prediction_dataset PandasDataset<size=156, freq=h, num_feat_dynamic_real=6, num_past_feat_dynamic_real=2, num_feat_static_real=0, num_feat_static_cat=3, static_cardinalities=[ 2. 13. 12.]>
AutoGluon¶
Converts historic & future values to TimeSeriesDataFrame objects.
train_df, known_covariates = fev.convert_input_data(window, adapter="autogluon")
print("train_df")
display(train_df)
print("train_df.static_features")
display(train_df.static_features)
print("known_covariates")
display(known_covariates)
train_df
| target | api_calls | clouds | humidity | rain | temp | wind | distance_mean | surge_mean | ||
|---|---|---|---|---|---|---|---|---|---|---|
| item_id | timestamp | |||||||||
| T000000 | 2018-11-26 06:00:00 | 16.555555 | 9.0 | 0.990667 | 0.913333 | 0.000 | 40.627335 | 1.350667 | 1.726667 | 1.055556 |
| 2018-11-26 07:00:00 | 17.299999 | 10.0 | 0.970000 | 0.920000 | 0.000 | 41.137501 | 1.735000 | 1.690000 | 1.100000 | |
| 2018-11-26 08:00:00 | 13.500000 | 1.0 | 0.980000 | 0.923333 | 0.000 | 40.919998 | 1.330000 | 1.380000 | 1.000000 | |
| 2018-11-26 09:00:00 | 17.954546 | 11.0 | 1.000000 | 0.927500 | 0.000 | 40.937500 | 1.365000 | 1.920909 | 1.113636 | |
| 2018-11-26 10:00:00 | 18.625000 | 12.0 | 0.995000 | 0.940000 | 0.000 | 40.695000 | 1.895000 | 2.122500 | 1.083333 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| T000155 | 2018-12-17 08:00:00 | 9.454545 | 11.0 | 1.000000 | 0.920000 | 0.000 | 37.279999 | 10.670000 | 2.230909 | 1.000000 |
| 2018-12-17 09:00:00 | 9.700000 | 15.0 | 1.000000 | 0.930000 | 0.000 | 36.189999 | 9.760000 | 2.447333 | 1.000000 | |
| 2018-12-17 10:00:00 | 9.300000 | 10.0 | 1.000000 | 0.930000 | 0.003 | 34.750000 | 9.950000 | 2.203000 | 1.000000 | |
| 2018-12-17 11:00:00 | 9.400000 | 15.0 | 1.000000 | 0.930000 | 0.009 | 34.180000 | 9.240000 | 2.139333 | 1.000000 | |
| 2018-12-17 12:00:00 | 9.593750 | 16.0 | 0.990000 | 0.930000 | 0.000 | 34.209999 | 8.380000 | 1.958750 | 1.000000 |
79716 rows × 9 columns
train_df.static_features
| provider_name | provider_service | source_location | |
|---|---|---|---|
| item_id | |||
| T000000 | Lyft | Lux | Back Bay |
| T000001 | Lyft | Lux Black | Back Bay |
| T000002 | Lyft | Lux Black XL | Back Bay |
| T000003 | Lyft | Lyft | Back Bay |
| T000004 | Lyft | Lyft XL | Back Bay |
| ... | ... | ... | ... |
| T000151 | Uber | Taxi | West End |
| T000152 | Uber | UberPool | West End |
| T000153 | Uber | UberX | West End |
| T000154 | Uber | UberXL | West End |
| T000155 | Uber | WAV | West End |
156 rows × 3 columns
known_covariates
| api_calls | clouds | humidity | rain | temp | wind | ||
|---|---|---|---|---|---|---|---|
| item_id | timestamp | ||||||
| T000000 | 2018-12-17 13:00:00 | 10.0 | 0.97 | 0.90 | 0.0 | 35.169998 | 7.22 |
| 2018-12-17 14:00:00 | 7.0 | 0.92 | 0.90 | 0.0 | 36.299999 | 6.87 | |
| 2018-12-17 15:00:00 | 13.0 | 0.88 | 0.87 | 0.0 | 37.250000 | 7.58 | |
| 2018-12-17 16:00:00 | 12.0 | 1.00 | 0.84 | 0.0 | 39.000000 | 6.28 | |
| 2018-12-17 17:00:00 | 9.0 | 0.95 | 0.81 | 0.0 | 40.009998 | 6.46 | |
| ... | ... | ... | ... | ... | ... | ... | ... |
| T000155 | 2018-12-18 14:00:00 | 17.0 | 0.48 | 0.47 | 0.0 | 26.190001 | 13.89 |
| 2018-12-18 15:00:00 | 15.0 | 0.34 | 0.46 | 0.0 | 27.219999 | 15.03 | |
| 2018-12-18 16:00:00 | 15.0 | 0.31 | 0.47 | 0.0 | 28.700001 | 14.60 | |
| 2018-12-18 17:00:00 | 9.0 | 0.15 | 0.46 | 0.0 | 30.049999 | 13.55 | |
| 2018-12-18 18:00:00 | 12.0 | 0.00 | 0.46 | 0.0 | 30.790001 | 13.09 |
4680 rows × 6 columns
Nixtla¶
Similar to pandas, but ID, timestamp and target columns are renamed to unique_id, ds and y respectively.
train_df, future_df, static_df = fev.convert_input_data(window, adapter="nixtla")
print("train_df")
display(train_df.head())
print("future_df")
display(future_df.head())
print("static_df")
display(static_df.head())
train_df
| unique_id | ds | y | api_calls | clouds | humidity | rain | temp | wind | distance_mean | surge_mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | T000000 | 2018-11-26 06:00:00 | 16.555555 | 9.0 | 0.990667 | 0.913333 | 0.0 | 40.627335 | 1.350667 | 1.726667 | 1.055556 |
| 1 | T000000 | 2018-11-26 07:00:00 | 17.299999 | 10.0 | 0.970000 | 0.920000 | 0.0 | 41.137501 | 1.735000 | 1.690000 | 1.100000 |
| 2 | T000000 | 2018-11-26 08:00:00 | 13.500000 | 1.0 | 0.980000 | 0.923333 | 0.0 | 40.919998 | 1.330000 | 1.380000 | 1.000000 |
| 3 | T000000 | 2018-11-26 09:00:00 | 17.954546 | 11.0 | 1.000000 | 0.927500 | 0.0 | 40.937500 | 1.365000 | 1.920909 | 1.113636 |
| 4 | T000000 | 2018-11-26 10:00:00 | 18.625000 | 12.0 | 0.995000 | 0.940000 | 0.0 | 40.695000 | 1.895000 | 2.122500 | 1.083333 |
future_df
| unique_id | ds | api_calls | clouds | humidity | rain | temp | wind | |
|---|---|---|---|---|---|---|---|---|
| 0 | T000000 | 2018-12-17 13:00:00 | 10.0 | 0.97 | 0.90 | 0.0 | 35.169998 | 7.22 |
| 1 | T000000 | 2018-12-17 14:00:00 | 7.0 | 0.92 | 0.90 | 0.0 | 36.299999 | 6.87 |
| 2 | T000000 | 2018-12-17 15:00:00 | 13.0 | 0.88 | 0.87 | 0.0 | 37.250000 | 7.58 |
| 3 | T000000 | 2018-12-17 16:00:00 | 12.0 | 1.00 | 0.84 | 0.0 | 39.000000 | 6.28 |
| 4 | T000000 | 2018-12-17 17:00:00 | 9.0 | 0.95 | 0.81 | 0.0 | 40.009998 | 6.46 |
static_df
| unique_id | provider_name | provider_service | source_location | |
|---|---|---|---|---|
| 0 | T000000 | Lyft | Lux | Back Bay |
| 1 | T000001 | Lyft | Lux Black | Back Bay |
| 2 | T000002 | Lyft | Lux Black XL | Back Bay |
| 3 | T000003 | Lyft | Lyft | Back Bay |
| 4 | T000004 | Lyft | Lyft XL | Back Bay |