clewis/gcp_docs

Fork: 0

clewis / gcp_docs_scrape

2025-05-29	e6655f2 Browse files » Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed. moreserverless committed on 29 May
2025-05-26	01f981b Browse files » Added scraped data. Filename is the H1 tag of the page scraped. _RAW.txt is the data returned from BeautifulSoup so if I want to make minor changes I don't need to re-scrape the page. moreserverless committed on 26 May
	2ade741 Browse files » Cleaning up parsed output as well as keeping the text from <a> tags that were previously stripped out. Most of the <a> contained <code> blocks or <pre> tags. moreserverless committed on 26 May
	4534c14 Browse files » commented out over complicated way of extracting only the page content moreserverless committed on 26 May
2025-05-25	b5f4e90 Browse files » removed link tags from article body moreserverless committed on 25 May
2025-05-25	731af26 Browse files » Initial commit. moreserverless committed on 25 May