GitBucket
Toggle navigation
Pull requests
Issues
Snippets
Sign in
Files
Branches
1
Releases
Issues
Pull requests
Labels
Priorities
Milestones
Wiki
Fork
: 0
clewis
/
gcp_docs_scrape
PoC of scraping GCP documents page
Download ZIP
5commits
Transfer to URL with SHA
branch:
main
Branches
Tags
×
main
latest commit
01f981b775
moreserverless
authored
2 days ago
page_content
Added scraped data. Filename is the H1 tag of the page scraped. _RAW.txt is the data returned from BeautifulSoup so if I want to make minor changes I don't need to re-scrape the page.
2 days ago
raw_soup
Added scraped data. Filename is the H1 tag of the page scraped. _RAW.txt is the data returned from BeautifulSoup so if I want to make minor changes I don't need to re-scrape the page.
2 days ago
.gitignore
Initial commit.
3 days ago
.python-version
Initial commit.
3 days ago
gcp_docs.ipynb
Cleaning up parsed output as well as keeping the text from <a> tags that were previously stripped out. Most of the <a> contained <code> blocks or <pre> tags.
2 days ago
gcp_products.ipynb
Initial commit.
3 days ago
sample.ipynb
Initial commit.
3 days ago