python – How to find and fetch text from a html file?


<div>
  <div>
    <h1>Title</h1>
    <p>Some text i want to fetch</p>
  </div>
  <RandomVueComponent>Some other text i want to fetch</RandomVueComponent>
</div>
<a href="#">Might be some text too</a>

I am using regex through python scripts to scan my codebase. And those scripts returns me a list of all the texts in my codebase with their file path, line… etc. So i can keep track of texts in my html files. But, i can’t find a way to find all text, regardless of how much it is nested in html delimiters, and regardless of the delimiter itself. So the delimiter cannot be used as a matching key (such as r'<p>text</p>).

I would like to avoid using libraries as much as possible.
I have heard about Beautiful Soup, but i think it needs a matching key (such as the html tag).

What did i try ? I tried some sketchy regex that didn’t work obviously.



Source link

Leave a Comment