Getting URLs from Website¶
Let us understand how we can get URLs from a web page’s nav bar or side bar using BeautifulSoup.
Here are some of the key observations about https://python.itversity.com/mastering-python.html.
All the content in the website can be accessed using nav bar on the left side.
When we click on a particular topic, it will expand the sub topics.
First level links are defined using class as
reference internal.Second level links defined using class as
reference internalunderliwith classtoctree-l1 current active. They are visible only when we click on main topics as part of the nav bar on the left.
import requests
python_base_url = 'https://python.itversity.com'
python_url = f'{python_base_url}/mastering-python.html'
python_page = requests.get(python_url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(python_page.content, 'html.parser')
Let us get first level urls using reference internal.
Get all the first level urls.
Here are the observations about all the first level of urls from https://python.itversity.com/mastering-python.html.
All the URLs are on left nav bar under
navtag.We need to get hrefs from the
navtag.
Here are the steps we are going to follow:
Get all the nav tags. We need to use
docsnav.Get all the hrefs from nav using id
for nav in soup.find_all('nav'):
print(nav['id'])
bd-docs-nav
bd-toc-nav
nav = soup.find('nav', {'id': 'bd-docs-nav'})
nav = soup.find('nav', id='bd-docs-nav')
for a in nav.find_all('a', {'class': 'reference internal'}):
print(f"{python_base_url}/{a['href']}")
https://python.itversity.com/#
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
for a in nav.find_all('a', class_='reference internal'):
print(f"{python_base_url}/{a['href']}")
https://python.itversity.com/#
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
for a in nav.find_all('a', {'class': 'reference internal'}):
if a['href'] != '#':
print(f"{python_base_url}/{a['href']}")
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
first_level_urls = []
for a in nav.find_all('a', class_='reference internal'):
if a['href'] != '#':
first_level_urls.append(a['href'])
for url in first_level_urls: print(url)
01_overview_of_windows_os/01_overview_of_windows_os.html
04_postgres_database_operations/01_postgres_database_operations.html
05_getting_started_with_python/01_getting_started_with_python.html
06_basic_programming_constructs/01_basic_programming_constructs.html
07_pre_defined_functions/01_pre_defined_functions.html
08_user_defined_functions/01_user_defined_functions.html
09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
17_database_programming_crud_operations/01_database_programming_crud_operations.html
18_database_programming_batch_operations/01_database_programming_batch_operations.html
19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
Let us get second level urls using reference internal with in current reference internal.
Get all the first level urls.
Create soup objects for each of the first level urls and then get content from
toctree-l1 current activeusingreference internal.Make sure the urls are prefixed properly by replacing last part of the url with the
hrefextracted.
for first_level_url in first_level_urls:
url = f"{python_base_url}/{first_level_url}"
print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
current_nav = soup.find('nav', id='bd-docs-nav')
current_href = current_nav.find('li', class_='toctree-l1 current active')
for second_level_href in current_href.find_all('a', class_='reference internal'):
print(f"{'/'.join(url.split('/')[:-1])}/{second_level_href['href']}")
https://python.itversity.com/01_overview_of_windows_os/01_overview_of_windows_os.html
https://python.itversity.com/01_overview_of_windows_os/02_getting_system_details.html
https://python.itversity.com/01_overview_of_windows_os/03_managing_windows_system.html
https://python.itversity.com/01_overview_of_windows_os/04_overview_of_microsoft_office.html
https://python.itversity.com/01_overview_of_windows_os/05_overview_of_editors_and_ides.html
https://python.itversity.com/01_overview_of_windows_os/06_power_shell_and_command_prompt.html
https://python.itversity.com/01_overview_of_windows_os/07_connecting_to_linux_servers.html
https://python.itversity.com/01_overview_of_windows_os/08_folders_and_files.html
https://python.itversity.com/04_postgres_database_operations/01_postgres_database_operations.html
https://python.itversity.com/04_postgres_database_operations/02_overview_of_sql.html
https://python.itversity.com/04_postgres_database_operations/03_create_database_and_users_table.html
https://python.itversity.com/04_postgres_database_operations/04_ddl_data_definition_language.html
https://python.itversity.com/04_postgres_database_operations/05_dml_data_manipulation_language.html
https://python.itversity.com/04_postgres_database_operations/06_dql_data_query_language.html
https://python.itversity.com/04_postgres_database_operations/07_crud_operations_dml_and_dql.html
https://python.itversity.com/04_postgres_database_operations/08_tcl_transaction_control_language.html
https://python.itversity.com/04_postgres_database_operations/09_example_data_engineering.html
https://python.itversity.com/04_postgres_database_operations/10_example_web_application.html
https://python.itversity.com/04_postgres_database_operations/11_exercise_database_operations.html
https://python.itversity.com/05_getting_started_with_python/01_getting_started_with_python.html
https://python.itversity.com/05_getting_started_with_python/02_installing_python.html
https://python.itversity.com/05_getting_started_with_python/03_overview_of_anaconda.html
https://python.itversity.com/05_getting_started_with_python/04_python_cli_and_jupyter_notebook.html
https://python.itversity.com/05_getting_started_with_python/05_overview_of_jupyter_lab.html
https://python.itversity.com/05_getting_started_with_python/06_using_ides_pycharm.html
https://python.itversity.com/05_getting_started_with_python/07_overview_of_visual_studio_code.html
https://python.itversity.com/05_getting_started_with_python/08_using_itversity_labs.html
https://python.itversity.com/05_getting_started_with_python/09_leveraging_googles_colab.html
https://python.itversity.com/06_basic_programming_constructs/01_basic_programming_constructs.html
https://python.itversity.com/06_basic_programming_constructs/02_getting_help.html
https://python.itversity.com/06_basic_programming_constructs/03_variables_and_objects.html
https://python.itversity.com/06_basic_programming_constructs/04_data_types_commonly_used.html
https://python.itversity.com/06_basic_programming_constructs/05_operators_in_python.html
https://python.itversity.com/06_basic_programming_constructs/06_comments_and_doc_strings.html
https://python.itversity.com/06_basic_programming_constructs/07_conditionals.html
https://python.itversity.com/06_basic_programming_constructs/08_all_about_for_loops.html
https://python.itversity.com/06_basic_programming_constructs/09_running_os_commands.html
https://python.itversity.com/06_basic_programming_constructs/10_exercises.html
https://python.itversity.com/07_pre_defined_functions/01_pre_defined_functions.html
https://python.itversity.com/07_pre_defined_functions/02_overview_of_pre-defined_functions.html
https://python.itversity.com/07_pre_defined_functions/03_numeric_functions.html
https://python.itversity.com/07_pre_defined_functions/04_overview_of_strings.html
https://python.itversity.com/07_pre_defined_functions/05_string_manipulation_functions.html
https://python.itversity.com/07_pre_defined_functions/06_formatting_strings.html
https://python.itversity.com/07_pre_defined_functions/07_print_and_input_functions.html
https://python.itversity.com/07_pre_defined_functions/08_date_manipulation_functions.html
https://python.itversity.com/07_pre_defined_functions/09_special_functions.html
https://python.itversity.com/07_pre_defined_functions/10_exercises.html
https://python.itversity.com/08_user_defined_functions/01_user_defined_functions.html
https://python.itversity.com/08_user_defined_functions/02_defining_functions.html
https://python.itversity.com/08_user_defined_functions/03_doc_strings.html
https://python.itversity.com/08_user_defined_functions/04_returning_values.html
https://python.itversity.com/08_user_defined_functions/05_function_parameters_and_arguments.html
https://python.itversity.com/08_user_defined_functions/06_varying_arguments.html
https://python.itversity.com/08_user_defined_functions/07_keyword_arguments.html
https://python.itversity.com/08_user_defined_functions/08_recap_of_user_defined_functions.html
https://python.itversity.com/08_user_defined_functions/09_passing_functions_as_arguments.html
https://python.itversity.com/08_user_defined_functions/10_lambda_functions.html
https://python.itversity.com/08_user_defined_functions/11_usage_of_lambda_functions.html
https://python.itversity.com/08_user_defined_functions/12_exercise_user_defined_functions.html
https://python.itversity.com/09_overview_of_collections_list_and_set/01_overview_of_collections_list_and_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/02_overview_of_list_and_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/03_common_operations.html
https://python.itversity.com/09_overview_of_collections_list_and_set/04_accessing_elements_from_list.html
https://python.itversity.com/09_overview_of_collections_list_and_set/05_adding_elements_to_list.html
https://python.itversity.com/09_overview_of_collections_list_and_set/07_other_list_operations.html
https://python.itversity.com/09_overview_of_collections_list_and_set/08_adding_and_deleting_elements_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/09_typical_set_operations.html
https://python.itversity.com/09_overview_of_collections_list_and_set/10_validating_set.html
https://python.itversity.com/09_overview_of_collections_list_and_set/11_list_and_set_usage.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/01_overview_of_collections_dict_and_tuple.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/02_overview_of_dict_and_tuple.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/03_common_operations.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/04_accessing_elements_tuples.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/05_accessing_elements_dict.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/06_manipulating_dict.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/07_common_examples_dict.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/08_list_of_tuples.html
https://python.itversity.com/10_overview_of_collections_dict_and_tuple/09_list_of_dicts.html
https://python.itversity.com/11_manipulating_collections_using_loops/01_manipulating_collections_using_loops.html
https://python.itversity.com/11_manipulating_collections_using_loops/02_reading_files_into_collections.html
https://python.itversity.com/11_manipulating_collections_using_loops/03_overview_of_standard_transformations.html
https://python.itversity.com/11_manipulating_collections_using_loops/04_row_level_transformations.html
https://python.itversity.com/11_manipulating_collections_using_loops/05_getting_unique_elements.html
https://python.itversity.com/11_manipulating_collections_using_loops/06_filtering_data.html
https://python.itversity.com/11_manipulating_collections_using_loops/07_preparing_data_sets.html
https://python.itversity.com/11_manipulating_collections_using_loops/08_quick_recap_of_dict_operations.html
https://python.itversity.com/11_manipulating_collections_using_loops/09_performing_total_aggregations.html
https://python.itversity.com/11_manipulating_collections_using_loops/10_performing_grouped_aggregations.html
https://python.itversity.com/11_manipulating_collections_using_loops/11_joining_data_sets.html
https://python.itversity.com/11_manipulating_collections_using_loops/12_limitations_of_using_loops.html
https://python.itversity.com/11_manipulating_collections_using_loops/13_exercises.html
https://python.itversity.com/12_development_of_map_reduce_apis/01_development_of_map_reduce_apis.html
https://python.itversity.com/12_development_of_map_reduce_apis/02_develop_myFilter.html
https://python.itversity.com/12_development_of_map_reduce_apis/03_validate_myFilter.html
https://python.itversity.com/12_development_of_map_reduce_apis/04_develop_myMap.html
https://python.itversity.com/12_development_of_map_reduce_apis/05_validate_myMap.html
https://python.itversity.com/12_development_of_map_reduce_apis/06_develop_myReduce.html
https://python.itversity.com/12_development_of_map_reduce_apis/07_validate_myReduce_function.html
https://python.itversity.com/12_development_of_map_reduce_apis/08_develop_myReduceByKey.html
https://python.itversity.com/12_development_of_map_reduce_apis/09_validate_myReduceKey.html
https://python.itversity.com/12_development_of_map_reduce_apis/10_exercises.html
https://python.itversity.com/13_understanding_map_reduce_libraries/01_understanding_map_reduce_libraries.html
https://python.itversity.com/13_understanding_map_reduce_libraries/02_preparing_data_sets.html
https://python.itversity.com/13_understanding_map_reduce_libraries/03_filtering_data_using_filter.html
https://python.itversity.com/13_understanding_map_reduce_libraries/04_projecting_data_using_map.html
https://python.itversity.com/13_understanding_map_reduce_libraries/05_row_level_transformations_using_map.html
https://python.itversity.com/13_understanding_map_reduce_libraries/06_aggregations_using_reduce.html
https://python.itversity.com/13_understanding_map_reduce_libraries/07_overview_of_itertools.html
https://python.itversity.com/13_understanding_map_reduce_libraries/08_using_groupby.html
https://python.itversity.com/13_understanding_map_reduce_libraries/09_limitations_of_map_reduce_libraries.html
https://python.itversity.com/14_overview_of_object_oriented_programming/01_overview_of_object_oriented_programming.html
https://python.itversity.com/14_overview_of_object_oriented_programming/02_classes_and_objects.html
https://python.itversity.com/14_overview_of_object_oriented_programming/03_constructors.html
https://python.itversity.com/14_overview_of_object_oriented_programming/04_methods.html
https://python.itversity.com/14_overview_of_object_oriented_programming/05_inheritance.html
https://python.itversity.com/14_overview_of_object_oriented_programming/06_encapsulation.html
https://python.itversity.com/14_overview_of_object_oriented_programming/07_polymorphism.html
https://python.itversity.com/14_overview_of_object_oriented_programming/08_dynamic_classes.html
https://python.itversity.com/15_overview_of_pandas_libraries/01_overview_of_pandas_libraries.html
https://python.itversity.com/15_overview_of_pandas_libraries/02_pandas_data_structures_overview.html
https://python.itversity.com/15_overview_of_pandas_libraries/03_overview_of_series.html
https://python.itversity.com/15_overview_of_pandas_libraries/04_creating_data_frames_from_lists.html
https://python.itversity.com/15_overview_of_pandas_libraries/05_data_frames_basic_operations.html
https://python.itversity.com/15_overview_of_pandas_libraries/06_csv_to_pandas_data_frame.html
https://python.itversity.com/15_overview_of_pandas_libraries/07_projecting_and_filtering.html
https://python.itversity.com/15_overview_of_pandas_libraries/08_performing_total_aggregations.html
https://python.itversity.com/15_overview_of_pandas_libraries/09_performing_grouped_aggregations.html
https://python.itversity.com/15_overview_of_pandas_libraries/10_writing_data_frames_to_files.html
https://python.itversity.com/15_overview_of_pandas_libraries/12_exercises_pandas_data_frames.html
https://python.itversity.com/15_overview_of_pandas_libraries/11_joining_data_frames.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/01_web_scraping_using_beautifulsoup.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/02_problem_statement.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/03_installing_pre-requisites.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/04_overview_of_beautifulsoup.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/05_getting_html_content.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/06_processing_html_content.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/07_creating_data_frame.html
https://python.itversity.com/16_web_scraping_using_beautifulsoup/08_processing_data_using_data_frame_apis.html
https://python.itversity.com/17_database_programming_crud_operations/01_database_programming_crud_operations.html
https://python.itversity.com/17_database_programming_crud_operations/02_overview_of_database_programming.html
https://python.itversity.com/17_database_programming_crud_operations/03_recap_of_rdbms_concepts.html
https://python.itversity.com/17_database_programming_crud_operations/04_setup_database_client_libraries.html
https://python.itversity.com/17_database_programming_crud_operations/05_function_get_database_connection.html
https://python.itversity.com/17_database_programming_crud_operations/06_creating_database_table.html
https://python.itversity.com/17_database_programming_crud_operations/07_inserting_data_into_table.html
https://python.itversity.com/17_database_programming_crud_operations/08_updating_existing_table_data.html
https://python.itversity.com/17_database_programming_crud_operations/09_deleting_data_from_table.html
https://python.itversity.com/17_database_programming_crud_operations/10_querying_data_from_table.html
https://python.itversity.com/17_database_programming_crud_operations/11_recap_crud_operations.html
https://python.itversity.com/18_database_programming_batch_operations/01_database_programming_batch_operations.html
https://python.itversity.com/18_database_programming_batch_operations/02_function_get_database_connection.html
https://python.itversity.com/18_database_programming_batch_operations/03_creating_database_table.html
https://python.itversity.com/18_database_programming_batch_operations/04_recap_of_insert.html
https://python.itversity.com/18_database_programming_batch_operations/05_preparing_database.html
https://python.itversity.com/18_database_programming_batch_operations/06_reading_data_from_file.html
https://python.itversity.com/19_project_web_scraping_into_database/01_project_web_scraping_into_database.html
https://python.itversity.com/19_project_web_scraping_into_database/02_define_problem_statement.html
https://python.itversity.com/19_project_web_scraping_into_database/03_setup_project.html
https://python.itversity.com/19_project_web_scraping_into_database/04_overview_of_python_virtual_environments.html
https://python.itversity.com/19_project_web_scraping_into_database/05_installing_required_libraries.html
https://python.itversity.com/19_project_web_scraping_into_database/06_setup_logging.html
https://python.itversity.com/19_project_web_scraping_into_database/07_modularizing_the_project.html
https://python.itversity.com/19_project_web_scraping_into_database/08_setup_database.html
https://python.itversity.com/19_project_web_scraping_into_database/10_create_required_table.html
https://python.itversity.com/19_project_web_scraping_into_database/11_reading_the_data.html
https://python.itversity.com/19_project_web_scraping_into_database/12_validating_data.html
https://python.itversity.com/19_project_web_scraping_into_database/13_apply_required_transformations.html
https://python.itversity.com/19_project_web_scraping_into_database/14_writing_to_database.html
https://python.itversity.com/19_project_web_scraping_into_database/15_run_queries_against_data.html