Installing Pre-requisites

Let us install prerequisites to take care of web scraping using Python. We will use Python libraries such as requests, BeautifulSoup and optionally pandas to perform Web Scraping and then process the data.

  • Library to get the content from HTML Pages requests

  • Process HTML Tags and extract data using APIs provided bybeautifulsoup4

  • Once the data is scraped from HTML pages we can process it by using pandas Data Frame APIs. Alternatively, we can also use native collections and associated libraries to process the scraped data.

pip install beautifulsoup4
pip install pandas
!pip show beautifulsoup4
Name: beautifulsoup4
Version: 4.9.3
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
Author-email: leonardr@segfault.org
License: MIT
Location: /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages
Requires: soupsieve
Required-by: sphinx-book-theme
!pip show pandas
Name: pandas
Version: 1.1.5
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/itversity/.local/lib/python3.6/site-packages
Requires: pytz, numpy, python-dateutil
Required-by: beakerx
!pip install beautifulsoup4==4.9.3
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: beautifulsoup4==4.9.3 in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (4.9.3)
Requirement already satisfied: soupsieve>1.2; python_version >= "3.0" in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (from beautifulsoup4==4.9.3) (2.1)
!pip install pandas==1.1.5
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas==1.1.5 in /home/itversity/.local/lib/python3.6/site-packages (1.1.5)
Requirement already satisfied: pytz>=2017.2 in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (from pandas==1.1.5) (2020.4)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (from pandas==1.1.5) (2.8.1)
Requirement already satisfied: numpy>=1.15.4 in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (from pandas==1.1.5) (1.19.4)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas==1.1.5) (1.15.0)