Time Series Classification Synthetic vs Real Financial Time Series

Distinguishing between real financial time series and synthetic time series using XGBoost

Excerpt

I was given a “Data Science” challenge as part of an interview in which I had to distinguish between real financial time series and synthetic time series. I document the results here, the data was anonymous and I have no idea which assets were which or from what time series the assets came from.

All I knew was that I had 12,000 real time series and 12,000 synthetically created time series. (apologies for no data but this was the companies data and not mine, I have uploaded the train and test data sets discussed later here where you should be able to run the final XGBoost model). In total there were 24,000 observations. I show the code here for methodological purposes and if you are interested in visualising time series in R and ggplot2. The time series features used here are taken from the following papers:

Large Scale Unusual Time Series Detection by R.Hyndman, E.Wang and N.Laptev
Visualising forecasting algorithm performance using time series instance spaces by Y.Kang, Rob.Hyndman and Kate Smith-Miles

You can check out my Jupyter Notebook version here.

I added a lot of notes to the code throughout the document which might be of additional interest.

Lets get started…

I often remove all other data in my environment before hand and turn scientific notation off which is what the first 2 lines does. The shhh command is useful for Jupyter Notebooks which outputs all the warning messages, adding shhh suppresses these warning messaged when loading in the packages. (In R markdown I can set warning = FALSE but there is no option on Notebooks. – that I know of – )

rm(list = ls())
options(scipen=999)
setwd('C:/Users/Matt/Desktop/Data Science Challenge')
shhh <- suppressPackageStartupMessages

shhh(library(dplyr))
library(readr)
library(TSrepr)
library(ggplot2)
library(data.table)
library(cluster)
library(clusterCrit)
library(fractalrock)
library(cowplot)
library(tidyr)
library(tidyquant)
library(lmtest)
library(aTSA)
library(tsoutliers)
library(tsfeatures)
library(xgboost)
library(caret)
library(purrr)

train_val <- read_csv("train.csv")
test <- read_csv("test.csv")

NOTE:

I have 2 data sets, the train_Val.csv for training and validation data set and the test.csv data set. I do not touch the test.csv data set until the very end in part 3. All the analysis and optimisation is performed only on the train_val.csv data set. The train_val.csv contains 12,000 observations and the test.csv contains 12,000 observations.

Part 1

The data was given to me in this format:

head(train_val[, 1:5], 1)

## # A tibble: 1 x 5
##   feature1 feature2 feature3 feature4 feature5
##      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
## 1  0.00629  0.00441  -0.0381   0.0253 -0.00658

The names of the columns are as follows:

colnames(train_val) %>%
  data.frame() %>%
  setNames(c("features")) %>%
  split(as.integer(gl(nrow(.), 20, nrow(.)))) %>%
  kable(caption = "Time series variables") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 12)

The goal: Was to classify which financial time series were real vs which were synthetically created (by some algorithm I have no knowledge of how it generated the synthetic time series)

I re-arranged the data using the melt function in R, however I suggest anybody reading this to use the pivol_longer function from the tidyverse packages. The pivot_longer package was released a few weeks after writing the code for this problem.

Visit Matthew Smith R Blog to read the full article and download the R code:
https://lf0.com/post/synth-real-time-series/financial-time-series/

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Visit IBKR.com Open an IBKR Account

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Matthew Smith - R Blog and is being posted with its permission. The views expressed in this material are solely those of the author and/or Matthew Smith - R Blog and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How much could you save on your margin loan by switching to Interactive Brokers?

Fill out the information below to see your estimated savings.

Current Interest Rate

Balance

USD

Margin Amount Borrowed

USD

Time Margin is Borrowed

IBKR will assess a surcharge of 1% on large loan balances unless otherwise prearranged with IBKR. The 1% surcharge would apply to all balances in the highest tier.

The interest calculator is based on information that we believe to be accurate and correct, but neither Interactive Brokers LLC nor its affiliates warrant its accuracy or adequacy and it should not be relied upon as such. Neither IBKR nor its affiliates are responsible for any errors or omissions or for results obtained from the use of this calculator.

Restrictions apply. Annual Percentage Rate (APR) on USD margin loan balances for IBKR Pro as of October 3, 2024. Interactive Brokers calculates the interest charged on margin loans using the applicable rates for each interest rate tier listed on its website. Learn more about margin loan rates.

The projections or other information generated by the Interest Calculator tool are hypothetical in nature, do not reflect actual results and are not guarantees of future results. Please note that results may vary with use of the tool over time.

Trading on margin is only for experienced investors with high risk tolerance. You may lose more than your initial investment. For additional information about rates on margin loans, please see Margin Loan Rates.

ListDer Research

Time Series Classification Synthetic vs Real Financial Time Series

Posted February 26, 2020 at 2:18 pm

Distinguishing between real financial time series and synthetic time series using XGBoost

Join The Conversation

Disclosure: Interactive Brokers

Information on Other Interactive Brokers Affiliates

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

ListDer Research

Distinguishing between real financial time series and synthetic time series using XGBoost

Related Tags

Join The Conversation

Disclosure: Interactive Brokers

Bi-Weekly Newsletter

Daily Newsletter

Weekly Newsletter

Weekly Newsletter

Monthly Newsletter