Creating Data Frames from listsΒΆ
Let us go through the details of creating Data Frames using collections.
Pandas Data Frame is a two-dimensional labeled array capable of holding attributes of any data type.
It is similar to multi column excel spreadsheet or a database table.
We can create Data Frame using list of tuples or list of dicts.
We can also create Data Frames using data from files. We will have a look at it later.
import pandas as pd
Note
Creating Pandas Data Frame using list of tuples.
sals_ld = [(1, 1500.0), (2, 2000.0, 10.0), (3, 2200.00)]
sals_df = pd.DataFrame(sals_ld)
sals_df
0 | 1 | 2 | |
---|---|---|---|
0 | 1 | 1500.0 | NaN |
1 | 2 | 2000.0 | 10.0 |
2 | 3 | 2200.0 | NaN |
sals_df = pd.DataFrame(sals_ld, columns=['id', 'sal', 'comm'])
sals_df
id | sal | comm | |
---|---|---|---|
0 | 1 | 1500.0 | NaN |
1 | 2 | 2000.0 | 10.0 |
2 | 3 | 2200.0 | NaN |
sals_df['id']
0 1
1 2
2 3
Name: id, dtype: int64
sals_df[['id', 'sal']]
id | sal | |
---|---|---|
0 | 1 | 1500.0 |
1 | 2 | 2000.0 |
2 | 3 | 2200.0 |
Note
Creating Pandas Data Frame using list of dicts.
sals_ld = [
{'id': 1, 'sal': 1500.0},
{'id': 2, 'sal': 2000.0},
{'id': 3, 'sal': 2200.0}
]
Note
Column names will be inherited automatically using keys from the dict.
sals_df = pd.DataFrame(sals_ld)
sals_df
id | sal | |
---|---|---|
0 | 1 | 1500.0 |
1 | 2 | 2000.0 |
2 | 3 | 2200.0 |
sals_df['id']
0 1
1 2
2 3
Name: id, dtype: int64
sals_ld = [
{'id': 1, 'sal': 1500.0},
{'id': 2, 'sal': 2000.0, 'comm': 10},
{'id': 3, 'sal': 2200.0}
]
pd.DataFrame?
Init signature:
pd.DataFrame(
data=None,
index:Union[Collection, NoneType]=None,
columns:Union[Collection, NoneType]=None,
dtype:Union[_ForwardRef('ExtensionDtype'), str, numpy.dtype, Type[Union[str, float, int, complex]], NoneType]=None,
copy:bool=False,
)
Docstring:
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.
Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects.
.. versionchanged:: 0.23.0
If data is a dict, column order follows insertion-order for
Python 3.6 and later.
.. versionchanged:: 0.25.0
If data is a list of dicts, column order follows insertion-order
for Python 3.6 and later.
index : Index or array-like
Index to use for resulting frame. Will default to RangeIndex if
no indexing information part of input data and no index provided.
columns : Index or array-like
Column labels to use for resulting frame. Will default to
RangeIndex (0, 1, 2, ..., n) if no column labels are provided.
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
copy : bool, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input.
See Also
--------
DataFrame.from_records : Constructor from tuples, also record arrays.
DataFrame.from_dict : From dicts of Series, arrays, or dicts.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_table : Read general delimited file into DataFrame.
read_clipboard : Read text from clipboard into DataFrame.
Examples
--------
Constructing DataFrame from a dictionary.
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
Notice that the inferred dtype is int64.
>>> df.dtypes
col1 int64
col2 int64
dtype: object
To enforce a single dtype:
>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1 int8
col2 int8
dtype: object
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
... columns=['a', 'b', 'c'])
>>> df2
a b c
0 1 2 3
1 4 5 6
2 7 8 9
File: /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages/pandas/core/frame.py
Type: type
Subclasses: SubclassedDataFrame
sals_ld
[{'id': 1, 'sal': 1500.0},
{'id': 2, 'sal': 2000.0, 'comm': 10},
{'id': 3, 'sal': 2200.0}]
sals_df = pd.DataFrame(sals_ld)
sals_df
id | sal | comm | |
---|---|---|---|
0 | 1 | 1500.0 | NaN |
1 | 2 | 2000.0 | 10.0 |
2 | 3 | 2200.0 | NaN |