Getting URLs from Website¶
Let us understand how we can get URLs from a web page’s nav bar or side bar using BeautifulSoup
.
Here are some of the key observations about https://python.itversity.com/mastering-python.html.
All the content in the website can be accessed using nav bar on the left side.
When we click on a particular topic, it will expand the sub topics.
First level links are defined using class as
reference internal
.Second level links defined using class as
reference internal
underli
with classtoctree-l1 current active
. They are visible only when we click on main topics as part of the nav bar on the left.
import requests
python_base_url = 'https://python.itversity.com'
python_url = f'{python_base_url}/mastering-python.html'
python_page = requests.get(python_url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(python_page.content, 'html.parser')
Let us get first level urls using reference internal
.
Get all the first level urls.
Here are the observations about all the first level of urls from https://python.itversity.com/mastering-python.html.
All the URLs are on left nav bar under
nav
tag.We need to get hrefs from the
nav
tag.
Here are the steps we are going to follow:
Get all the nav tags. We need to use
docs
nav.Get all the hrefs from nav using id
for nav in soup.find_all('nav'):
print(nav['id'])
bd-docs-nav
bd-toc-nav
nav = soup.find('nav', {'id': 'bd-docs-nav'})
nav = soup.find('nav', id='bd-docs-nav')
for a in nav.find_all('a', {'class': 'reference internal'}):
print(f"{python_base_url}/{a['href']}")
https://python.itversity.com/#
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
for a in nav.find_all('a', class_='reference internal'):
print(f"{python_base_url}/{a['href']}")
https://python.itversity.com/#
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
for a in nav.find_all('a', {'class': 'reference internal'}):
if a['href'] != '#':
print(f"{python_base_url}/{a['href']}")
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
first_level_urls = []
for a in nav.find_all('a', class_='reference internal'):
if a['href'] != '#':
first_level_urls.append(a['href'])
for url in first_level_urls: print(url)
01_overview_of_windows_os/01_overview_of_windows_os.html
04_postgres_database_operations/01_postgres_database_operations.html
05_getting_started_with_python/01_getting_started_with_python.html
06_basic_programming_constructs/01_basic_programming_constructs.html
07_pre_defined_functions/01_pre_defined_functions.html
08_user_defined_functions/01_user_defined_functions.html
09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
17_database_programming_crud_operations/01_database_programming_crud_operations.html
18_database_programming_batch_operations/01_database_programming_batch_operations.html
19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
Let us get second level urls using reference internal
with in current reference internal
.
Get all the first level urls.
Create soup objects for each of the first level urls and then get content from
toctree-l1 current active
usingreference internal
.Make sure the urls are prefixed properly by replacing last part of the url with the
href
extracted.
for first_level_url in first_level_urls:
url = f"{python_base_url}/{first_level_url}"
print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
current_nav = soup.find('nav', id='bd-docs-nav')
current_href = current_nav.find('li', class_='toctree-l1 current active')
for second_level_href in current_href.find_all('a', class_='reference internal'):
print(f"{'/'.join(url.split('/')[:-1])}/{second_level_href['href']}")
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/01_overview_of_windows_os/02_getting_system_details.html
https://python.itversity.com/01_overview_of_windows_os/03_managing_windows_system.html
https://python.itversity.com/01_overview_of_windows_os/04_overview_of_microsoft_office.html
https://python.itversity.com/01_overview_of_windows_os/05_overview_of_editors_and_ides.html
https://python.itversity.com/01_overview_of_windows_os/06_power_shell_and_command_prompt.html
https://python.itversity.com/01_overview_of_windows_os/07_connecting_to_linux_servers.html
https://python.itversity.com/01_overview_of_windows_os/08_folders_and_files.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/04_postgres_database_operations/02_overview_of_sql.html
https://python.itversity.com/04_postgres_database_operations/03_create_database_and_users_table.html
https://python.itversity.com/04_postgres_database_operations/04_ddl_data_definition_language.html
https://python.itversity.com/04_postgres_database_operations/05_dml_data_manipulation_language.html
https://python.itversity.com/04_postgres_database_operations/06_dql_data_query_language.html
https://python.itversity.com/04_postgres_database_operations/07_crud_operations_dml_and_dql.html
https://python.itversity.com/04_postgres_database_operations/08_tcl_transaction_control_language.html
https://python.itversity.com/04_postgres_database_operations/09_example_data_engineering.html
https://python.itversity.com/04_postgres_database_operations/10_example_web_application.html
https://python.itversity.com/04_postgres_database_operations/11_exercise_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/05_getting_started_with_python/02_installing_python.html
https://python.itversity.com/05_getting_started_with_python/03_overview_of_anaconda.html
https://python.itversity.com/05_getting_started_with_python/04_python_cli_and_jupyter_notebook.html
https://python.itversity.com/05_getting_started_with_python/05_overview_of_jupyter_lab.html
https://python.itversity.com/05_getting_started_with_python/06_using_ides_pycharm.html
https://python.itversity.com/05_getting_started_with_python/07_overview_of_visual_studio_code.html
https://python.itversity.com/05_getting_started_with_python/08_using_itversity_labs.html
https://python.itversity.com/05_getting_started_with_python/09_leveraging_googles_colab.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/06_basic_programming_constructs/02_getting_help.html
https://python.itversity.com/06_basic_programming_constructs/03_variables_and_objects.html
https://python.itversity.com/06_basic_programming_constructs/04_data_types_commonly_used.html
https://python.itversity.com/06_basic_programming_constructs/05_operators_in_python.html
https://python.itversity.com/06_basic_programming_constructs/06_comments_and_doc_strings.html
https://python.itversity.com/06_basic_programming_constructs/07_conditionals.html
https://python.itversity.com/06_basic_programming_constructs/08_all_about_for_loops.html
https://python.itversity.com/06_basic_programming_constructs/09_running_os_commands.html
https://python.itversity.com/06_basic_programming_constructs/10_exercises.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/07_pre_defined_functions/02_overview_of_pre-defined_functions.html
https://python.itversity.com/07_pre_defined_functions/03_numeric_functions.html
https://python.itversity.com/07_pre_defined_functions/04_overview_of_strings.html
https://python.itversity.com/07_pre_defined_functions/05_string_manipulation_functions.html
https://python.itversity.com/07_pre_defined_functions/06_formatting_strings.html
https://python.itversity.com/07_pre_defined_functions/07_print_and_input_functions.html
https://python.itversity.com/07_pre_defined_functions/08_date_manipulation_functions.html
https://python.itversity.com/07_pre_defined_functions/09_special_functions.html
https://python.itversity.com/07_pre_defined_functions/10_exercises.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/08_user_defined_functions/02_defining_functions.html
https://python.itversity.com/08_user_defined_functions/03_doc_strings.html
https://python.itversity.com/08_user_defined_functions/04_returning_values.html
https://python.itversity.com/08_user_defined_functions/05_function_parameters_and_arguments.html
https://python.itversity.com/08_user_defined_functions/06_varying_arguments.html
https://python.itversity.com/08_user_defined_functions/07_keyword_arguments.html
https://python.itversity.com/08_user_defined_functions/08_recap_of_user_defined_functions.html
https://python.itversity.com/08_user_defined_functions/09_passing_functions_as_arguments.html
https://python.itversity.com/08_user_defined_functions/10_lambda_functions.html
https://python.itversity.com/08_user_defined_functions/11_usage_of_lambda_functions.html
https://python.itversity.com/08_user_defined_functions/12_exercise_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/02_overview_of_list_and_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/03_common_operations.html
https://python.itversity.com/09_overview_of_collections_list_and_set/04_accessing_elements_from_list.html
https://python.itversity.com/09_overview_of_collections_list_and_set/05_adding_elements_to_list.html
https://python.itversity.com/09_overview_of_collections_list_and_set/07_other_list_operations.html
https://python.itversity.com/09_overview_of_collections_list_and_set/08_adding_and_deleting_elements_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/09_typical_set_operations.html
https://python.itversity.com/09_overview_of_collections_list_and_set/10_validating_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/11_list_and_set_usage.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/02_overview_of_dict_and_tuple.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/03_common_operations.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/04_accessing_elements_tuples.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/05_accessing_elements_dict.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/06_manipulating_dict.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/07_common_examples_dict.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/08_list_of_tuples.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/09_list_of_dicts.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/11_manipulating_collections_using_loops/02_reading_files_into_collections.html
https://python.itversity.com/11_manipulating_collections_using_loops/03_overview_of_standard_transformations.html
https://python.itversity.com/11_manipulating_collections_using_loops/04_row_level_transformations.html
https://python.itversity.com/11_manipulating_collections_using_loops/05_getting_unique_elements.html
https://python.itversity.com/11_manipulating_collections_using_loops/06_filtering_data.html
https://python.itversity.com/11_manipulating_collections_using_loops/07_preparing_data_sets.html
https://python.itversity.com/11_manipulating_collections_using_loops/08_quick_recap_of_dict_operations.html
https://python.itversity.com/11_manipulating_collections_using_loops/09_performing_total_aggregations.html
https://python.itversity.com/11_manipulating_collections_using_loops/10_performing_grouped_aggregations.html
https://python.itversity.com/11_manipulating_collections_using_loops/11_joining_data_sets.html
https://python.itversity.com/11_manipulating_collections_using_loops/12_limitations_of_using_loops.html
https://python.itversity.com/11_manipulating_collections_using_loops/13_exercises.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/12_development_of_map_reduce_apis/02_develop_myFilter.html
https://python.itversity.com/12_development_of_map_reduce_apis/03_validate_myFilter.html
https://python.itversity.com/12_development_of_map_reduce_apis/04_develop_myMap.html
https://python.itversity.com/12_development_of_map_reduce_apis/05_validate_myMap.html
https://python.itversity.com/12_development_of_map_reduce_apis/06_develop_myReduce.html
https://python.itversity.com/12_development_of_map_reduce_apis/07_validate_myReduce_function.html
https://python.itversity.com/12_development_of_map_reduce_apis/08_develop_myReduceByKey.html
https://python.itversity.com/12_development_of_map_reduce_apis/09_validate_myReduceKey.html
https://python.itversity.com/12_development_of_map_reduce_apis/10_exercises.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/13_understanding_map_reduce_libraries/02_preparing_data_sets.html
https://python.itversity.com/13_understanding_map_reduce_libraries/03_filtering_data_using_filter.html
https://python.itversity.com/13_understanding_map_reduce_libraries/04_projecting_data_using_map.html
https://python.itversity.com/13_understanding_map_reduce_libraries/05_row_level_transformations_using_map.html
https://python.itversity.com/13_understanding_map_reduce_libraries/06_aggregations_using_reduce.html
https://python.itversity.com/13_understanding_map_reduce_libraries/07_overview_of_itertools.html
https://python.itversity.com/13_understanding_map_reduce_libraries/08_using_groupby.html
https://python.itversity.com/13_understanding_map_reduce_libraries/09_limitations_of_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/14_overview_of_object_oriented_programming/02_classes_and_objects.html
https://python.itversity.com/14_overview_of_object_oriented_programming/03_constructors.html
https://python.itversity.com/14_overview_of_object_oriented_programming/04_methods.html
https://python.itversity.com/14_overview_of_object_oriented_programming/05_inheritance.html
https://python.itversity.com/14_overview_of_object_oriented_programming/06_encapsulation.html
https://python.itversity.com/14_overview_of_object_oriented_programming/07_polymorphism.html
https://python.itversity.com/14_overview_of_object_oriented_programming/08_dynamic_classes.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/15_overview_of_pandas_libraries/02_pandas_data_structures_overview.html
https://python.itversity.com/15_overview_of_pandas_libraries/03_overview_of_series.html
https://python.itversity.com/15_overview_of_pandas_libraries/04_creating_data_frames_from_lists.html
https://python.itversity.com/15_overview_of_pandas_libraries/05_data_frames_basic_operations.html
https://python.itversity.com/15_overview_of_pandas_libraries/06_csv_to_pandas_data_frame.html
https://python.itversity.com/15_overview_of_pandas_libraries/07_projecting_and_filtering.html
https://python.itversity.com/15_overview_of_pandas_libraries/08_performing_total_aggregations.html
https://python.itversity.com/15_overview_of_pandas_libraries/09_performing_grouped_aggregations.html
https://python.itversity.com/15_overview_of_pandas_libraries/10_writing_data_frames_to_files.html
https://python.itversity.com/15_overview_of_pandas_libraries/12_exercises_pandas_data_frames.html
https://python.itversity.com/15_overview_of_pandas_libraries/11_joining_data_frames.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/02_problem_statement.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/03_installing_pre-requisites.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/04_overview_of_beautifulsoup.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/05_getting_html_content.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/06_processing_html_content.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/07_creating_data_frame.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/08_processing_data_using_data_frame_apis.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/17_database_programming_crud_operations/02_overview_of_database_programming.html
https://python.itversity.com/17_database_programming_crud_operations/03_recap_of_rdbms_concepts.html
https://python.itversity.com/17_database_programming_crud_operations/04_setup_database_client_libraries.html
https://python.itversity.com/17_database_programming_crud_operations/05_function_get_database_connection.html
https://python.itversity.com/17_database_programming_crud_operations/06_creating_database_table.html
https://python.itversity.com/17_database_programming_crud_operations/07_inserting_data_into_table.html
https://python.itversity.com/17_database_programming_crud_operations/08_updating_existing_table_data.html
https://python.itversity.com/17_database_programming_crud_operations/09_deleting_data_from_table.html
https://python.itversity.com/17_database_programming_crud_operations/10_querying_data_from_table.html
https://python.itversity.com/17_database_programming_crud_operations/11_recap_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/18_database_programming_batch_operations/02_function_get_database_connection.html
https://python.itversity.com/18_database_programming_batch_operations/03_creating_database_table.html
https://python.itversity.com/18_database_programming_batch_operations/04_recap_of_insert.html
https://python.itversity.com/18_database_programming_batch_operations/05_preparing_database.html
https://python.itversity.com/18_database_programming_batch_operations/06_reading_data_from_file.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
https://python.itversity.com/19_project_web_scraping_into_database/02_define_problem_statement.html
https://python.itversity.com/19_project_web_scraping_into_database/03_setup_project.html
https://python.itversity.com/19_project_web_scraping_into_database/04_overview_of_python_virtual_environments.html
https://python.itversity.com/19_project_web_scraping_into_database/05_installing_required_libraries.html
https://python.itversity.com/19_project_web_scraping_into_database/06_setup_logging.html
https://python.itversity.com/19_project_web_scraping_into_database/07_modularizing_the_project.html
https://python.itversity.com/19_project_web_scraping_into_database/08_setup_database.html
https://python.itversity.com/19_project_web_scraping_into_database/10_create_required_table.html
https://python.itversity.com/19_project_web_scraping_into_database/11_reading_the_data.html
https://python.itversity.com/19_project_web_scraping_into_database/12_validating_data.html
https://python.itversity.com/19_project_web_scraping_into_database/13_apply_required_transformations.html
https://python.itversity.com/19_project_web_scraping_into_database/14_writing_to_database.html
https://python.itversity.com/19_project_web_scraping_into_database/15_run_queries_against_data.html