## Pandas Data Structures â€“ Overview

Let us understand the details with respect to Pandas.
* Pandas is not a core Python module and hence we need to install using pip - `pip install pandas`.
* It has 2 types of data structures - `Series` and `DataFrame`.
* `Series` is a one dimension array while `DataFrame` is a two dimension array.
* `Series` only contains index for each row and one attribute or column.
* `DataFrame` contains index for each row and multiple columns.
* Each attribute in the DataFrame is nothing but a Series.
* We can perform all standard transformations using Pandas APIs
* We also have SQL based wrappers on top of Pandas where we can write queries.

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/BQM2W5HFIG8?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

Here are the steps to get started with Pandas Data Structures:
* Make sure Pandas library is installed using `pip`.
* Import Pandas library - `import pandas as pd`
* We need to have a collection or data in a file to create Pandas Data Structures.
* Use appropriate APIs on the data to create Pandas Data Structures.
  * `Series` for single dimension array.
  * `DataFrame` for two dimension array.

```{note}
Typically we use `Series` for list of regular objects or dict and `DataFrame` for list of tuples or list of dicts. Let us use list for `Series` and list of dicts for `DataFrame`.
```

In [1]:
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable


In [2]:
import pandas as pd

In [3]:
sals_l = [1500.0, 2000.0, 2200.00]

In [4]:
pd.Series?

[0;31mInit signature:[0m
[0mpd[0m[0;34m.[0m[0mSeries[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcopy[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfastpath[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currentl

In [5]:
sals_s = pd.Series(sals_l, name='sal')

In [6]:
sals_s

0    1500.0
1    2000.0
2    2200.0
Name: sal, dtype: float64

In [8]:
sals_s[:2]

0    1500.0
1    2000.0
Name: sal, dtype: float64

In [13]:
sals_ld = [(1, 1500.0), (2, 2000.0), (3, 2200.00)]

In [14]:
pd.DataFrame?

[0;31mInit signature:[0m
[0mpd[0m[0;34m.[0m[0mDataFrame[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m:[0m[0mUnion[0m[0;34m[[0m[0mCollection[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcolumns[0m[0;34m:[0m[0mUnion[0m[0;34m[[0m[0mCollection[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m:[0m[0mUnion[0m[0;34m[[0m[0m_ForwardRef[0m[0;34m([0m[0;34m'ExtensionDtype'[0m[0;34m)[0m[0;34m,[0m [0mstr[0m[0;34m,[0m [0mnumpy[0m[0;34m.[0m[0mdtype[0m[0;34m,[0m [0mType[0m[0;34m[[0m[0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mfloat[0m[0;34m,[0m [0mint[0m[0;34m,[0m [0mcomplex[0m[0;34m][0m[0;34m][0m[0;34m,[0m [0mNoneType[0m[0;34m][0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0

In [15]:
sals_df = pd.DataFrame(sals_ld, columns=['id', 'sal'])

In [16]:
sals_df

Unnamed: 0,id,sal
0,1,1500.0
1,2,2000.0
2,3,2200.0


In [20]:
sals_df['id']

0    1
1    2
2    3
Name: id, dtype: int64