list and set - Usage¶
Let us see some real world usage of list and set while building Python based applications.
list
is used more often thanset
.Reading data from file into a
list
Reading data from a table into a
list
We can convert a
list
toset
to perform these operations.Get unique elements from the
list
Perform
set
operations between 2 lists such as union, intersection, difference etc.
We can convert a
set
tolist
to perform these operations.Reverse the collection
Append multiple collections to create new collections while retaining duplicates
You will see some of these in action as we get into other related topics down the line
%%sh
ls -ltr /data/retail_db/orders/part-00000
-rw-r--r-- 1 root root 2999944 Nov 22 16:08 /data/retail_db/orders/part-00000
# Reading data from file into a list
path = '/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)
orders_raw = orders_file.read()
orders = orders_raw.splitlines()
orders[:10]
['1,2013-07-25 00:00:00.0,11599,CLOSED',
'2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
'3,2013-07-25 00:00:00.0,12111,COMPLETE',
'4,2013-07-25 00:00:00.0,8827,CLOSED',
'5,2013-07-25 00:00:00.0,11318,COMPLETE',
'6,2013-07-25 00:00:00.0,7130,COMPLETE',
'7,2013-07-25 00:00:00.0,4530,COMPLETE',
'8,2013-07-25 00:00:00.0,2911,PROCESSING',
'9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
'10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
len(orders) # same as number of records in the file
68883
# Get unique dates
dates = ['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0']
dates
['2013-07-25 00:00:00.0',
'2013-07-25 00:00:00.0',
'2013-07-26 00:00:00.0',
'2014-01-25 00:00:00.0']
len(dates)
4
set(dates)
{'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}
len(dates)
4
# Creating new collection retaining duplicates using 2 sets
s1 = {'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}
s2 = {'2013-08-25 00:00:00.0', '2013-08-26 00:00:00.0', '2014-01-25 00:00:00.0'}
s1.union(s2)
{'2013-07-25 00:00:00.0',
'2013-07-26 00:00:00.0',
'2013-08-25 00:00:00.0',
'2013-08-26 00:00:00.0',
'2014-01-25 00:00:00.0'}
len(s1.union(s2))
5
s = list(s1) + list(s2)
s
['2013-07-26 00:00:00.0',
'2013-07-25 00:00:00.0',
'2014-01-25 00:00:00.0',
'2014-01-25 00:00:00.0',
'2013-08-26 00:00:00.0',
'2013-08-25 00:00:00.0']
len(s)
6