PoC of scraping GCP documents page

@moreserverless moreserverless authored on 29 May
docker Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed. 2 months ago
gcp_pages Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed. 2 months ago
sitemap_data Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed. 2 months ago
.gitignore Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed. 2 months ago
.python-version Initial commit. 2 months ago
gcp_docs.ipynb Cleaning up parsed output as well as keeping the text from <a> tags that were previously stripped out. Most of the <a> contained <code> blocks or <pre> tags. 2 months ago
gcp_products.ipynb Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed. 2 months ago
sample.ipynb Initial commit. 2 months ago