A new Go library called hnscrape provides developers with a client for accessing the Hacker News Firebase API, including a particularly useful feature: the ability to retrieve content from posts that have been removed or flagged by moderators.
When Hacker News removes a post, the official API typically returns empty responses—stripping away the title, author information, and score. The hnscrape library works around this limitation by scraping the HTML page directly, allowing users to recover the original data using their session credentials.
Getting Started
Installation is straightforward through Go's package manager:
go get github.com/larrasket/hnscrape
The library supports standard operations like fetching top stories, retrieving individual items, looking up user profiles, and checking for recent updates. Basic usage requires minimal setup and works without authentication for publicly available content.
Accessing Removed Content
To retrieve data from dead or flagged posts, users must authenticate. The library offers two approaches: direct login with username and password, or providing a pre-existing session cookie from your browser. The latter method skips the login step entirely.
When authentication is configured, GetItem automatically triggers HTML scraping whenever the API indicates a post is dead. Alternatively, developers can force scraping on any item regardless of its status using GetItemWithScraping.
One important requirement: your Hacker News account must have "Show Dead" enabled in its profile settings for the scraping feature to function properly.
Configuration Options
The client is highly customizable. Users can set connection timeouts, supply a custom HTTP client, or provide their session cookie during initialization. Concurrent requests for multiple items or users default to 10 parallel operations, though this can be adjusted via the HNAPI_CONCURRENT_LIMIT environment variable.
Caveats and Considerations
The HTML parsing approach is inherently fragile—when Hacker News updates its page markup, the scraper may break. The project maintainer encourages users to report issues with specific item IDs when this occurs. While the Firebase API itself has no published rate limits, responsible usage is expected. The library only supports read operations.
Source: Hacker News Show HN