2025-05-29
@moreserverless
Preparing to scrape and populate db to not visit a page more than one. Restructured folders and removed some extraneous pages no longer needed.
moreserverless committed on 29 May
2025-05-26
@moreserverless
Added scraped data. Filename is the H1 tag of the page scraped. _RAW.txt is the data returned from BeautifulSoup so if I want to make minor changes I don't need to re-scrape the page.
moreserverless committed on 26 May
@moreserverless
Cleaning up parsed output as well as keeping the text from <a> tags that were previously stripped out. Most of the <a> contained <code> blocks or <pre> tags.
moreserverless committed on 26 May
@moreserverless
commented out over complicated way of extracting only the page content
moreserverless committed on 26 May
2025-05-25
@moreserverless
removed link tags from article body
moreserverless committed on 25 May
@moreserverless
Initial commit.
moreserverless committed on 25 May