PoC of scraping GCP documents page

@moreserverless moreserverless authored 2 days ago
page_content Added scraped data. Filename is the H1 tag of the page scraped. _RAW.txt is the data returned from BeautifulSoup so if I want to make minor changes I don't need to re-scrape the page. 2 days ago
raw_soup Added scraped data. Filename is the H1 tag of the page scraped. _RAW.txt is the data returned from BeautifulSoup so if I want to make minor changes I don't need to re-scrape the page. 2 days ago
.gitignore Initial commit. 3 days ago
.python-version Initial commit. 3 days ago
gcp_docs.ipynb Cleaning up parsed output as well as keeping the text from <a> tags that were previously stripped out. Most of the <a> contained <code> blocks or <pre> tags. 2 days ago
gcp_products.ipynb Initial commit. 3 days ago
sample.ipynb Initial commit. 3 days ago