## Manipulating HTML Content

Let us understand how to manipulate HTML content leveraging APIs provided by BeautifulSoup.

* `decompose` - to remove the tag along with the content.
* `unwrap` - to remove the tag by retaining the content.
* We can also change the properties of the tag, by assigning values  to the generated dict type object.
* We can also enclose existing content or tag into new tags.

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/v5k1iA2RkW4?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

In [2]:
html_str = """
<p>Some Text</p>
<table>
    <tbody>
        <tr>
            <th>Details</th>
            <th>URL</th>
        </tr>
        <tr>
            <td>Video Content</td>
            <td><a href="https://www.youtube.com/itversityin">YouTube Channel</a>
            </td>
        </tr>
        <tr>
            <td>Reference Material</td>
            <td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
            </td>
        </tr>
    </tbody>
</table>
"""

In [3]:
from IPython.core.display import HTML, display
display(HTML(html_str))

Details,URL
Video Content,YouTube Channel
Reference Material,GitHub Repository


In [4]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(html_str, 'html.parser')
print(soup.prettify())

<p>
 Some Text
</p>
<table>
 <tbody>
  <tr>
   <th>
    Details
   </th>
   <th>
    URL
   </th>
  </tr>
  <tr>
   <td>
    Video Content
   </td>
   <td>
    <a href="https://www.youtube.com/itversityin">
     YouTube Channel
    </a>
   </td>
  </tr>
  <tr>
   <td>
    Reference Material
   </td>
   <td>
    <a href="https://www.github.com/dgadiraju/itversity-books">
     GitHub Repository
    </a>
   </td>
  </tr>
 </tbody>
</table>



### Using decompose

In [5]:
p = soup.find('p')

In [6]:
p.decompose()

In [7]:
soup



<table>
<tbody>
<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td>Video Content</td>
<td><a href="https://www.youtube.com/itversityin">YouTube Channel</a>
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

### Using unwrap

In [8]:
a = soup.find('a')

In [9]:
a

<a href="https://www.youtube.com/itversityin">YouTube Channel</a>

In [10]:
a.unwrap()

<a href="https://www.youtube.com/itversityin"></a>

In [11]:
soup



<table>
<tbody>
<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td>Video Content</td>
<td>YouTube Channel
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

In [12]:
from IPython.core.display import display, HTML
display(HTML(str(soup)))

Details,URL
Video Content,YouTube Channel
Reference Material,GitHub Repository


### Updating Tag Attribute

In [13]:
for tag in soup.find_all('tr'):
    print(tag)

<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td>Video Content</td>
<td>YouTube Channel
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>


In [14]:
for tag in soup.find_all('tr'):
    print(tag['class'])

KeyError: 'class'

In [15]:
for tag in soup.find_all('tr'):
    tag['class'] = 'special'

In [16]:
for tag in soup.find_all('tr'):
    print(tag['class'])

special
special
special


In [17]:
soup



<table>
<tbody>
<tr class="special">
<th>Details</th>
<th>URL</th>
</tr>
<tr class="special">
<td>Video Content</td>
<td>YouTube Channel
</td>
</tr>
<tr class="special">
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

### Wrapping Text

In [18]:
strong = soup.new_tag('strong')

In [19]:
strong

<strong></strong>

In [21]:
type(strong)

bs4.element.Tag

In [22]:
td = soup.find('td')

In [23]:
td.text

'Video Content'

In [24]:
strong.insert(0, td.text)

In [25]:
strong

<strong>Video Content</strong>

In [26]:
td.string = ''

In [27]:
td

<td></td>

In [28]:
td.insert(0, strong)

In [29]:
soup



<table>
<tbody>
<tr class="special">
<th>Details</th>
<th>URL</th>
</tr>
<tr class="special">
<td><strong>Video Content</strong></td>
<td>YouTube Channel
</td>
</tr>
<tr class="special">
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

In [30]:
for tag in soup.find_all('td'):
    if not tag.find('a'):
        strong = soup.new_tag('strong')
        strong.insert(0, tag.text)
        tag.string = ''
        tag.insert(0, strong)

In [31]:
soup



<table>
<tbody>
<tr class="special">
<th>Details</th>
<th>URL</th>
</tr>
<tr class="special">
<td><strong>Video Content</strong></td>
<td><strong>YouTube Channel
</strong></td>
</tr>
<tr class="special">
<td><strong>Reference Material</strong></td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

In [32]:
from IPython.core.display import HTML, display
display(HTML(str(soup)))

Details,URL
Video Content,YouTube Channel
Reference Material,GitHub Repository
