Dplython…dplyr for Python!

If you’re an avid R user, you probably use the famous dplyr package. Python has a package meant to be similar to dplyr, called dplython. This article will give an introduction for how to use dplython.

For the examples below, we’ll use a sample dataset that comes with R giving attributes about the US states, including population, area, and income levels. You can see the dataset by clicking here.

Initial setup

dplython can be installed using pip:.

pip install dplython

Once the package is installed, let’s load a few methods from it, and read in our dataset.

# load packages
from dplython import select, DplyFrame, X, arrange, count, sift, head, summarize, group_by, tail, mutate
import pandas as pd
 
# read in data
state_df = pd.read_csv("state_info.txt")

After we’ve read in our data, we need to convert it to an object called a DplyFrame, which we do using a method from dplython. This DplyFrame object will allow us to perform “dplyr-like” operations.

state_info = DplyFrame(state_df)

The Basics – How to select columns and get top rows

With dplython, selecting columns is done like below:

state_info >> select(X.State_Name, X.State_Region, X.Population)

Notice how we append X to each field name, which is a little different from the select command in dplyr. We also use >> instead of %>% as a pipe operator. If you’re not as familiar with dplyr from R, the pipe operator basically allows to you apply a function to the output of a function or object to the left the operator (the >> in dplython). In the case above, the select function is just being applied to state_info.

To select only the top 10 or bottom 10 rows, we can chain the head or tail methods, respectively, to our line of code above. Here the head and tail functions are being applied to the result to the left of each respective pipe operator.

# get top ten rows with selected columns
state_info >> select(X.State_Name, X.State_Region, X.Population) >> head(10)
 
# get bottom ten rows with selected columns
state_info >> select(X.State_Name, X.State_Region, X.Population) >> tail(10)

How to filter rows

Filtering can be done using the sift method. For example, if we want to get only the records where a state has an area less than 100,000 we would do this:

# filter initial data frame to states with areas less than 100,000 square miles
state_info >> sift(X.Area < 100000)
 
# filter rows and columns in one step
state_info >> sift(X.Area < 100000) >> 
              select(X.State_Name, X.State_Region, X.Population, X.Area)

Sorting datasets with dplython

To sort our DplyFrame by a column, we can use the arrange method, like in dplyr:

# sort DplyFrame in ascending order by Area
state_info >> arrange(X.Area)
 
# sort first by region, then by area
state_info >> arrange(X.State_Region, X.Area)

How to create new columns with dplython

Also similar to dplyr, we can use the mutate method to add fields to state_info:

with_extra_field = state_info >> mutate(Name_Region = X.State_Name + '-' + X.State_Region)
 
# create multiple fields in one step
more_fields = state_info >> mutate(Name_Region = X.State_Name + '-' + X.State_Region,
                                   Area_Per_Thousand_Sq_Mile = X.Area / 1000)

with_extra_field is now a DplyFrame containing the same fields as state_info, plus a field called “Name_Region”. more_fields contains two extra fields — one concatenating the state name and region, and one dividing the state area field by 1000.

Summarizing data

Aggregating data can be done using the summarize and group_by methods:

# get average area by region
state_info >> group_by(X.State_Region) >> summarize(average_region_area = X.Area.mean())
 
# or do multiple aggregations in one step
state_info >> group_by(X.State_Region) >> summarize(average_region_area = X.Area.mean(), \
                                                 >> max_region_area = X.Area.max), \
                                                 >> max_region_income = X.Income.max())

The first line of code above will show the average (mean) state area by region. The second line calculates multiple aggregations in one step, including mean and max state areas by region, as well as max state income by region.

Transposing a dataset

dplython can also transpose datasets using X.__T. (period followed by double underscore and second period).

# transpose state_info
state_info >> X._.T()

How to join tables together

dplython, like dplyr, has several “SQL-like” functions for joining datasets together:

from dplython import inner_join, left_join
 
 
right_table = more_fields >> select(X.State_Name, X.Name_Region)
 
# inner join
joined_result = inner_join(state_info, right_table, on = "State_Name")
 
# left_join
left_join_result = left_join(state_info, right_table, on = "State_Name")

In each type of join above, the first and second parameters are the left and right tables being joined. The “on” parameter specifies what column or list of columns should be used as the key to join the tables together.

Counting values in a column

To get the count of the values in a specific column, use the count function:

# get a count of each state region
state_info >> count(X.State_Region)

How to randomly sample rows with dplython

The sample_n function can be used to randomly sample rows from our dataset. You just need to pass whatever number of rows you want to randomly select (e.g. 10 in the below example).

from dplython import sample_n
 
# randomly select ten rows from state_info
state_info >> sample_n(10)

dplython can also sample a random proportion of a dataset. For example, if we want to randomly sample 70% of the rows, we can use the sample_frac function:

from dplython import sample_frac
 
# randomly sample 70% of the rows from state_info
state_info >> sample_frac(0.7)

The first line of code above will generate a DplyFrame with two columns: one showing the State Region, and the other showing the average area across each region. The second line contains this information, plus the max state area for each region, and the max state income for each region.

That’s it for this post! Please check out other Python posts of mine by clicking here.

Originally posted on TheAutomatic.net blog.

Join The Conversation

For specific platform feedback and suggestions, please submit it directly to our team using these instructions.

If you have an account-specific question or concern, please reach out to Client Services.

We encourage you to look through our FAQs before posting. Your question may already be covered!

Visit IBKR.com Open an IBKR Account

Leave a Reply Cancel reply

Disclosure: Interactive Brokers Third Party

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from TheAutomatic.net and is being posted with its permission. The views expressed in this material are solely those of the author and/or TheAutomatic.net and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How much could you save on your margin loan by switching to Interactive Brokers?

Fill out the information below to see your estimated savings.

Current Interest Rate

Balance

USD

Margin Amount Borrowed

USD

Time Margin is Borrowed

IBKR will assess a surcharge of 1% on large loan balances unless otherwise prearranged with IBKR. The 1% surcharge would apply to all balances in the highest tier.

The interest calculator is based on information that we believe to be accurate and correct, but neither Interactive Brokers LLC nor its affiliates warrant its accuracy or adequacy and it should not be relied upon as such. Neither IBKR nor its affiliates are responsible for any errors or omissions or for results obtained from the use of this calculator.

Restrictions apply. Annual Percentage Rate (APR) on USD margin loan balances for IBKR Pro as of October 3, 2024. Interactive Brokers calculates the interest charged on margin loans using the applicable rates for each interest rate tier listed on its website. Learn more about margin loan rates.

The projections or other information generated by the Interest Calculator tool are hypothetical in nature, do not reflect actual results and are not guarantees of future results. Please note that results may vary with use of the tool over time.

Trading on margin is only for experienced investors with high risk tolerance. You may lose more than your initial investment. For additional information about rates on margin loans, please see Margin Loan Rates.

Dplython…dplyr for Python!

Initial setup

The Basics – How to select columns and get top rows

How to filter rows

Sorting datasets with dplython

How to create new columns with dplython

Summarizing data

Transposing a dataset

How to join tables together

Counting values in a column

How to randomly sample rows with dplython

Join The Conversation

Leave a Reply Cancel reply

Disclosure: Interactive Brokers Third Party

Information on Other Interactive Brokers Affiliates

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

Initial setup

The Basics – How to select columns and get top rows

How to filter rows

Sorting datasets with dplython

How to create new columns with dplython

Summarizing data

Transposing a dataset

How to join tables together

Counting values in a column

How to randomly sample rows with dplython

Join The Conversation

Leave a Reply Cancel reply

Disclosure: Interactive Brokers Third Party

Bi-Weekly Newsletter

Daily Newsletter

Weekly Newsletter

Weekly Newsletter

Monthly Newsletter