Writing Data Frames to Files¶
Pandas also provides simple APIs to write the data back to files.
Let us write the revenue per order along with order_id to a file.
Here are the steps which you need to follow before writing Data Frame to a file.
Make sure you have the Data Frame that is supposed to be written to file.
You need to ensure that you have write permissions on the folder under which files are supposed to be written.
Make sure to use appropriate key word arguments to write the Data Frame into file as per the requirements.
%run 06_csv_to_pandas_data_frame.ipynb
order_items.to_csv?
Signature:
order_items.to_csv(
path_or_buf:Union[str, pathlib.Path, IO[~AnyStr], NoneType]=None,
sep:str=',',
na_rep:str='',
float_format:Union[str, NoneType]=None,
columns:Union[Sequence[collections.abc.Hashable], NoneType]=None,
header:Union[bool, List[str]]=True,
index:bool=True,
index_label:Union[bool, str, Sequence[collections.abc.Hashable], NoneType]=None,
mode:str='w',
encoding:Union[str, NoneType]=None,
compression:Union[str, Mapping[str, str], NoneType]='infer',
quoting:Union[int, NoneType]=None,
quotechar:str='"',
line_terminator:Union[str, NoneType]=None,
chunksize:Union[int, NoneType]=None,
date_format:Union[str, NoneType]=None,
doublequote:bool=True,
escapechar:Union[str, NoneType]=None,
decimal:Union[str, NoneType]='.',
errors:str='strict',
) -> Union[str, NoneType]
Docstring:
Write object to a comma-separated values (csv) file.
.. versionchanged:: 0.24.0
The order of arguments for Series was changed.
Parameters
----------
path_or_buf : str or file handle, default None
File path or object, if None is provided the result is returned as
a string. If a file object is passed it should be opened with
`newline=''`, disabling universal newlines.
.. versionchanged:: 0.24.0
Was previously named "path" for Series.
sep : str, default ','
String of length 1. Field delimiter for the output file.
na_rep : str, default ''
Missing data representation.
float_format : str, default None
Format string for floating point numbers.
columns : sequence, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
.. versionchanged:: 0.24.0
Previously defaulted to False for Series.
index : bool, default True
Write row names (index).
index_label : str or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
mode : str
Python write mode, default 'w'.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'.
compression : str or dict, default 'infer'
If str, represents compression mode. If dict, value at 'method' is
the compression mode. Compression mode may be any of the following
possible values: {'infer', 'gzip', 'bz2', 'zip', 'xz', None}. If
compression mode is 'infer' and `path_or_buf` is path-like, then
detect compression mode from the following extensions: '.gz',
'.bz2', '.zip' or '.xz'. (otherwise no compression). If dict given
and mode is one of {'zip', 'gzip', 'bz2'}, or inferred as
one of the above, other entries passed as
additional compression options.
.. versionchanged:: 1.0.0
May now be a dict with key 'method' as compression mode
and other entries as additional compression options if
compression mode is 'zip'.
.. versionchanged:: 1.1.0
Passing compression options as keys in dict is
supported for compression modes 'gzip' and 'bz2'
as well as 'zip'.
quoting : optional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a `float_format`
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
quotechar : str, default '\"'
String of length 1. Character used to quote fields.
line_terminator : str, optional
The newline character or character sequence to use in the output
file. Defaults to `os.linesep`, which depends on the OS in which
this method is called ('\n' for linux, '\r\n' for Windows, i.e.).
.. versionchanged:: 0.24.0
chunksize : int or None
Rows to write at a time.
date_format : str, default None
Format string for datetime objects.
doublequote : bool, default True
Control quoting of `quotechar` inside a field.
escapechar : str, default None
String of length 1. Character used to escape `sep` and `quotechar`
when appropriate.
decimal : str, default '.'
Character recognized as decimal separator. E.g. use ',' for
European data.
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
.. versionadded:: 1.1.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
See Also
--------
read_csv : Load a CSV file into a DataFrame.
to_excel : Write DataFrame to an Excel file.
Examples
--------
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create 'out.zip' containing 'out.csv'
>>> compression_opts = dict(method='zip',
... archive_name='out.csv') # doctest: +SKIP
>>> df.to_csv('out.zip', index=False,
... compression=compression_opts) # doctest: +SKIP
File: /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages/pandas/core/generic.py
Type: method
import getpass
username = getpass.getuser()
username
'itversity'
base_dir = f"/home/{username}/data/retail_db"
base_dir
'/home/itversity/data/retail_db'
output_dir = f'{base_dir}/revenue_per_order'
output_dir
'/home/itversity/data/retail_db/revenue_per_order'
%%sh
rm -rf /home/`whoami`/data/retail_db/revenue_per_order
%%sh
ls -ltr /home/`whoami`/data/retail_db
total 0
import subprocess
subprocess.call(['rm', '-rf', output_dir])
0
import subprocess
subprocess.call(['mkdir', '-p', output_dir])
0
import subprocess
#ls -ltr /Users/itversity/Research/data/retail_db/revenue_per_order
subprocess.check_output(['ls', '-ltr', output_dir])
b'total 0\n'
%%sh
ls -ltr /data/retail_db
total 20156
drwxr-xr-x 2 root root 4096 Nov 22 16:08 categories
-rw-r--r-- 1 root root 806 Nov 22 16:08 README.md
drwxr-xr-x 2 root root 4096 Nov 22 16:08 customers
-rw-r--r-- 1 root root 1748 Nov 22 16:08 create_db_tables_pg.sql
-rw-r--r-- 1 root root 10303297 Nov 22 16:08 create_db.sql
drwxr-xr-x 2 root root 4096 Nov 22 16:08 departments
drwxr-xr-x 2 root root 4096 Nov 22 16:08 order_items
-rw-r--r-- 1 root root 10297372 Nov 22 16:08 load_db_tables_pg.sql
drwxr-xr-x 2 root root 4096 Nov 22 16:08 orders
drwxr-xr-x 2 root root 4096 Nov 22 16:08 products
orders
order_id | order_date | order_customer_id | order_status | |
---|---|---|---|---|
0 | 1 | 2013-07-25 00:00:00.0 | 11599 | CLOSED |
1 | 2 | 2013-07-25 00:00:00.0 | 256 | PENDING_PAYMENT |
2 | 3 | 2013-07-25 00:00:00.0 | 12111 | COMPLETE |
3 | 4 | 2013-07-25 00:00:00.0 | 8827 | CLOSED |
4 | 5 | 2013-07-25 00:00:00.0 | 11318 | COMPLETE |
... | ... | ... | ... | ... |
68878 | 68879 | 2014-07-09 00:00:00.0 | 778 | COMPLETE |
68879 | 68880 | 2014-07-13 00:00:00.0 | 1117 | COMPLETE |
68880 | 68881 | 2014-07-19 00:00:00.0 | 2518 | PENDING_PAYMENT |
68881 | 68882 | 2014-07-22 00:00:00.0 | 10000 | ON_HOLD |
68882 | 68883 | 2014-07-23 00:00:00.0 | 5533 | COMPLETE |
68883 rows × 4 columns
order_items
order_item_id | order_item_order_id | order_item_product_id | order_item_quantity | order_item_subtotal | order_item_product_price | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 957 | 1 | 299.98 | 299.98 |
1 | 2 | 2 | 1073 | 1 | 199.99 | 199.99 |
2 | 3 | 2 | 502 | 5 | 250.00 | 50.00 |
3 | 4 | 2 | 403 | 1 | 129.99 | 129.99 |
4 | 5 | 4 | 897 | 2 | 49.98 | 24.99 |
... | ... | ... | ... | ... | ... | ... |
172193 | 172194 | 68881 | 403 | 1 | 129.99 | 129.99 |
172194 | 172195 | 68882 | 365 | 1 | 59.99 | 59.99 |
172195 | 172196 | 68882 | 502 | 1 | 50.00 | 50.00 |
172196 | 172197 | 68883 | 208 | 1 | 1999.99 | 1999.99 |
172197 | 172198 | 68883 | 502 | 3 | 150.00 | 50.00 |
172198 rows × 6 columns
order_items. \
groupby('order_item_order_id')['order_item_subtotal']. \
agg(['sum', 'min', 'max', 'count'])
sum | min | max | count | |
---|---|---|---|---|
order_item_order_id | ||||
1 | 299.98 | 299.98 | 299.98 | 1 |
2 | 579.98 | 129.99 | 250.00 | 3 |
4 | 699.85 | 49.98 | 299.95 | 4 |
5 | 1129.86 | 99.96 | 299.98 | 5 |
7 | 579.92 | 79.95 | 299.98 | 3 |
... | ... | ... | ... | ... |
68879 | 1259.97 | 129.99 | 999.99 | 3 |
68880 | 999.77 | 149.94 | 250.00 | 5 |
68881 | 129.99 | 129.99 | 129.99 | 1 |
68882 | 109.99 | 50.00 | 59.99 | 2 |
68883 | 2149.99 | 150.00 | 1999.99 | 2 |
57431 rows × 4 columns
order_items
order_item_id | order_item_order_id | order_item_product_id | order_item_quantity | order_item_subtotal | order_item_product_price | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 957 | 1 | 299.98 | 299.98 |
1 | 2 | 2 | 1073 | 1 | 199.99 | 199.99 |
2 | 3 | 2 | 502 | 5 | 250.00 | 50.00 |
3 | 4 | 2 | 403 | 1 | 129.99 | 129.99 |
4 | 5 | 4 | 897 | 2 | 49.98 | 24.99 |
... | ... | ... | ... | ... | ... | ... |
172193 | 172194 | 68881 | 403 | 1 | 129.99 | 129.99 |
172194 | 172195 | 68882 | 365 | 1 | 59.99 | 59.99 |
172195 | 172196 | 68882 | 502 | 1 | 50.00 | 50.00 |
172196 | 172197 | 68883 | 208 | 1 | 1999.99 | 1999.99 |
172197 | 172198 | 68883 | 502 | 3 | 150.00 | 50.00 |
172198 rows × 6 columns
order_items. \
groupby('order_item_order_id')['order_item_subtotal']. \
agg(['sum', 'min', 'max', 'count']). \
reset_index()
order_item_order_id | sum | min | max | count | |
---|---|---|---|---|---|
0 | 1 | 299.98 | 299.98 | 299.98 | 1 |
1 | 2 | 579.98 | 129.99 | 250.00 | 3 |
2 | 4 | 699.85 | 49.98 | 299.95 | 4 |
3 | 5 | 1129.86 | 99.96 | 299.98 | 5 |
4 | 7 | 579.92 | 79.95 | 299.98 | 3 |
... | ... | ... | ... | ... | ... |
57426 | 68879 | 1259.97 | 129.99 | 999.99 | 3 |
57427 | 68880 | 999.77 | 149.94 | 250.00 | 5 |
57428 | 68881 | 129.99 | 129.99 | 129.99 | 1 |
57429 | 68882 | 109.99 | 50.00 | 59.99 | 2 |
57430 | 68883 | 2149.99 | 150.00 | 1999.99 | 2 |
57431 rows × 5 columns
order_items. \
groupby('order_item_order_id')['order_item_subtotal']. \
agg(['sum', 'min', 'max', 'count']). \
rename(columns={'count': 'item_count', 'sum': 'revenue'}). \
to_json(f'{output_dir}/revenue_per_order.json', orient='table')
%%sh
ls -ltr /home/`whoami`/data/retail_db/revenue_per_order
total 4884
-rw-rw-r-- 1 itversity itversity 4999377 Dec 14 10:54 revenue_per_order.json
%%sh
head -10 /home/`whoami`/data/retail_db/revenue_per_order/revenue_per_order.json
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
order_items. \
groupby('order_item_order_id')['order_item_subtotal']. \
agg(['sum', 'min', 'max', 'count']). \
rename(columns={'count': 'item_count', 'sum': 'revenue'})
revenue | min | max | item_count | |
---|---|---|---|---|
order_item_order_id | ||||
1 | 299.98 | 299.98 | 299.98 | 1 |
2 | 579.98 | 129.99 | 250.00 | 3 |
4 | 699.85 | 49.98 | 299.95 | 4 |
5 | 1129.86 | 99.96 | 299.98 | 5 |
7 | 579.92 | 79.95 | 299.98 | 3 |
... | ... | ... | ... | ... |
68879 | 1259.97 | 129.99 | 999.99 | 3 |
68880 | 999.77 | 149.94 | 250.00 | 5 |
68881 | 129.99 | 129.99 | 129.99 | 1 |
68882 | 109.99 | 50.00 | 59.99 | 2 |
68883 | 2149.99 | 150.00 | 1999.99 | 2 |
57431 rows × 4 columns
order_items. \
groupby('order_item_order_id')['order_item_subtotal']. \
agg(['sum', 'min', 'max', 'count']). \
rename(columns={'count': 'item_count', 'sum': 'revenue'}). \
round(2). \
to_csv(output_dir + '/revenue_per_order.csv')
%%sh
ls -ltr /home/`whoami`/data/retail_db/revenue_per_order
total 6460
-rw-rw-r-- 1 itversity itversity 4999377 Dec 14 10:54 revenue_per_order.json
-rw-rw-r-- 1 itversity itversity 1610716 Dec 14 10:55 revenue_per_order.csv
%%sh
head /home/`whoami`/data/retail_db/revenue_per_order/revenue_per_order.csv
order_item_order_id,revenue,min,max,item_count
1,299.98,299.98,299.98,1
2,579.98,129.99,250.0,3
4,699.85,49.98,299.95,4
5,1129.86,99.96,299.98,5
7,579.92,79.95,299.98,3
8,729.84,50.0,299.95,4
9,599.96,199.98,199.99,3
10,651.92,21.99,199.99,5
11,919.79,49.98,399.96,5