Overview of Series

Let us quickly go through one of the Pandas Data Structure - Series.

  • Pandas Series is a one-dimensional labeled array capable of holding any data type.

  • It is similar to one column in an excel spreadsheet or a database table.

  • We can create Series by using dict.

d = {"JAN": 10, "FEB": 15, "MAR": 12, "APR": 16}
type(d)
dict
d
{'JAN': 10, 'FEB': 15, 'MAR': 12, 'APR': 16}
import pandas as pd
s = pd.Series(d)
s
JAN    10
FEB    15
MAR    12
APR    16
dtype: int64
import pandas as pd
s = pd.Series(d, name='val')
s
JAN    10
FEB    15
MAR    12
APR    16
Name: val, dtype: int64
s['FEB']
15
s[0]
10
s[1:3]
FEB    15
MAR    12
Name: val, dtype: int64
type(s)
pandas.core.series.Series
s.sum()
53
l = [10, 15, 12, 16]
l_s = pd.Series(l)
l_s
0    10
1    15
2    12
3    16
dtype: int64
l_s[0]
10
  • When we fetch only one column from a Pandas Dataframe, it will be returned as Series.

Note

Don’t worry too much about creating Data Frames yet, we are trying to understand how Data Frame and Series are related.

orders_path = "/data/retail_db/orders/part-00000"
orders_schema = [
  "order_id",
  "order_date",
  "order_customer_id",
  "order_status"
]
orders = pd.read_csv(orders_path,
  header=None,
  names=orders_schema
)
orders
order_id order_date order_customer_id order_status
0 1 2013-07-25 00:00:00.0 11599 CLOSED
1 2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT
2 3 2013-07-25 00:00:00.0 12111 COMPLETE
3 4 2013-07-25 00:00:00.0 8827 CLOSED
4 5 2013-07-25 00:00:00.0 11318 COMPLETE
... ... ... ... ...
68878 68879 2014-07-09 00:00:00.0 778 COMPLETE
68879 68880 2014-07-13 00:00:00.0 1117 COMPLETE
68880 68881 2014-07-19 00:00:00.0 2518 PENDING_PAYMENT
68881 68882 2014-07-22 00:00:00.0 10000 ON_HOLD
68882 68883 2014-07-23 00:00:00.0 5533 COMPLETE

68883 rows × 4 columns

type(orders)
pandas.core.frame.DataFrame
orders.order_date
0        2013-07-25 00:00:00.0
1        2013-07-25 00:00:00.0
2        2013-07-25 00:00:00.0
3        2013-07-25 00:00:00.0
4        2013-07-25 00:00:00.0
                 ...          
68878    2014-07-09 00:00:00.0
68879    2014-07-13 00:00:00.0
68880    2014-07-19 00:00:00.0
68881    2014-07-22 00:00:00.0
68882    2014-07-23 00:00:00.0
Name: order_date, Length: 68883, dtype: object
type(orders.order_date)
pandas.core.series.Series
order_dates = orders.order_date
order_dates
0        2013-07-25 00:00:00.0
1        2013-07-25 00:00:00.0
2        2013-07-25 00:00:00.0
3        2013-07-25 00:00:00.0
4        2013-07-25 00:00:00.0
                 ...          
68878    2014-07-09 00:00:00.0
68879    2014-07-13 00:00:00.0
68880    2014-07-19 00:00:00.0
68881    2014-07-22 00:00:00.0
68882    2014-07-23 00:00:00.0
Name: order_date, Length: 68883, dtype: object
type(order_dates)
pandas.core.series.Series