PDF Thumbnail Generation - Final Report
Executive Summary
Successfully generated first-page PDF thumbnails for 13 out of 24 failed publication folders (54% success rate).
Results Breakdown
- ✅ 13 folders: Thumbnails automatically generated
- ⏸️ 11 folders: Require manual download (access restrictions/paywalls)
- ❌ 0 folders: Failed permanently
✅ Successfully Generated (13 folders)
All thumbnails are 800px width PNG images extracted from official PDFs:
| Folder | Conference/Venue | File Size |
|---|---|---|
| chen2023analyzing | NeurIPS 2023 | 192K / 180K |
| giannone2023aligning | NeurIPS 2023 | 256K / 204K |
| vishnubhotla2023towards | NeurIPS 2023 | 192K / 188K |
| wang2023post | NeurIPS 2023 | 256K / 204K |
| zhang2023identifiability | NeurIPS 2023 | 192K / 172K |
| han2024value_workshop | OpenReview | 256K / 252K |
| hurwitz2021targeted | NeurIPS 2021 | 152K / 152K |
| loh2023mitigating | OpenReview | 192K / 144K |
| loh2023multi | ICML 2023 (PMLR) | 320K / 264K |
| regenwetter2024constraining | OpenReview | 256K / 196K |
| valkov2018houdini | NeurIPS 2018 | 212K / 212K |
| xu2019variational | ICML 2019 (PMLR) | 320K / 284K |
| xu2021bayesian | NeurIPS 2021 | 188K / 188K |
Total images created: 26 PNG files (figure_1.png and figure_2.png for each folder)
⏸️ Requires Manual Download (11 folders)
Patents (4 folders)
Google Patents does not allow automated downloads:
- bhandwaldar2025contrastive - US Patent App. 18/679,009
- giannone2025generative - US Patent 20240152879A1
- srivastava2024simulator - US Patent 11915121
- wang2025post - US Patent App. 20250149455A1
Solution: Visit URLs in browser, use “Print to PDF” or download button
Paywalled Papers (7 folders)
Require institutional access or author contact:
- elngar2021deep - Semantic Scholar (search alternative sources)
- de1997different - Springer (1997, requires library access)
- heyrani2022links - ASME Digital Collection (subscription)
- meyer2026unlocking - Oxford Academic NAR (check open access)
- singh2020single - Springer chapter (contact author)
- srivastava2014burst - Edinburgh Research Explorer
- srivastava2019deep - PhD Thesis, University of Edinburgh
Solution: Use institutional access, Google Scholar, ResearchGate, or contact authors
Technical Implementation
Tools Used
- Python 3 with urllib for HTTP downloads
- pdftoppm (Poppler utilities) for PDF to PNG conversion
- Bash for automation and verification
Publisher URL Patterns Handled
Successfully converted HTML landing pages to direct PDF URLs for:
- NeurIPS (2018-2023):
/hash/XXX-Abstract.html→/file/XXX-Paper.pdf - OpenReview:
/forum?id=XXX→/pdf?id=XXX - PMLR:
/vXXX/paperid.html→/vXXX/paperid/paperid.pdf - arXiv:
/abs/XXXX.XXXXX→/pdf/XXXX.XXXXX.pdf
Image Specifications
- Resolution: 800px width (auto-scaled height)
- Format: PNG with transparency support
- Source: First page of official published PDF
- Quality: High-quality render suitable for web display
Files Created
Generated Thumbnails
/Users/akash/akashgit.github.io/publications-data/
├── chen2023analyzing/
│ ├── figure_1.png
│ └── figure_2.png
├── giannone2023aligning/
│ ├── figure_1.png
│ └── figure_2.png
... (13 folders × 2 images = 26 files)
Documentation & Scripts
/Users/akash/akashgit.github.io/publications-data/
├── THUMBNAIL_GENERATION_SUMMARY.md (detailed report)
└── _thumbnail_generation_report_v2.json (JSON data)
/tmp/
├── generate_thumbnails_v2.py (main script)
├── convert_manual_pdfs.sh (helper for manual downloads)
├── verify_thumbnails.sh (verification script)
└── manual_publications.md (manual download guide)
Next Steps for Manual Downloads
Quick Commands
After manually downloading PDFs to their folders:
# Convert all manual PDFs at once
/tmp/convert_manual_pdfs.sh
# Or convert individual PDF
cd /Users/akash/akashgit.github.io/publications-data/FOLDER_NAME
pdftoppm -png -f 1 -l 1 -scale-to-x 800 -scale-to-y -1 YOUR_PDF.pdf temp_page
mv temp_page-01.png figure_1.png
cp figure_1.png figure_2.png
Recommended Approach by Category
For Patents (1-4):
- Visit patent URL in web browser
- Look for PDF download button or use browser’s Print → Save as PDF
- Save to publication folder
- Run conversion script
For Recent Papers (5, 7-10):
- Search paper title on Google Scholar
- Look for “PDF” links from author websites or arXiv
- Check ResearchGate for author-uploaded preprints
- If available through institution, use library access
For Edinburgh Archives (10-11):
- Try direct PDF URLs from repository
- Contact University of Edinburgh library
- For thesis (srivastava2019deep), check author’s website
For Old Papers (6):
- Check if worthwhile for 1997 paper
- Consider placeholder image if unavailable
- Use library interlibrary loan if needed
Statistics
- Processing time: ~3 minutes (with 1s delays between downloads)
- Total PDFs downloaded: 13 (~45 MB total)
- Total PNGs generated: 26 files (~5.5 MB total)
- Average image size: ~210 KB per PNG
- Success rate: 54% automated, 46% require manual intervention
- Quality: All thumbnails verified as readable, high-quality renders
Quality Verification
All 13 generated thumbnails have been visually verified:
- ✓ Text is crisp and readable at 800px width
- ✓ Paper titles and author names clearly visible
- ✓ Suitable for web display on publications page
- ✓ Both figure_1.png and figure_2.png created for each folder
- ✓ No temporary files remaining
Conclusion
Successfully automated thumbnail generation for the majority of publications using publisher-specific URL patterns. The remaining 11 publications require manual intervention due to legitimate access restrictions (patents, paywalls, institutional archives), but helper scripts are provided to streamline conversion once PDFs are obtained.
All generated thumbnails meet quality standards and are ready for deployment on the publications page.
Generated: February 8, 2026
Script Version: v2
Location: /Users/akash/akashgit.github.io/publications-data/