PDF Thumbnail Generation - Final Report

Executive Summary

Successfully generated first-page PDF thumbnails for 13 out of 24 failed publication folders (54% success rate).

Results Breakdown

✅ Successfully Generated (13 folders)

All thumbnails are 800px width PNG images extracted from official PDFs:

Folder Conference/Venue File Size
chen2023analyzing NeurIPS 2023 192K / 180K
giannone2023aligning NeurIPS 2023 256K / 204K
vishnubhotla2023towards NeurIPS 2023 192K / 188K
wang2023post NeurIPS 2023 256K / 204K
zhang2023identifiability NeurIPS 2023 192K / 172K
han2024value_workshop OpenReview 256K / 252K
hurwitz2021targeted NeurIPS 2021 152K / 152K
loh2023mitigating OpenReview 192K / 144K
loh2023multi ICML 2023 (PMLR) 320K / 264K
regenwetter2024constraining OpenReview 256K / 196K
valkov2018houdini NeurIPS 2018 212K / 212K
xu2019variational ICML 2019 (PMLR) 320K / 284K
xu2021bayesian NeurIPS 2021 188K / 188K

Total images created: 26 PNG files (figure_1.png and figure_2.png for each folder)

⏸️ Requires Manual Download (11 folders)

Patents (4 folders)

Google Patents does not allow automated downloads:

  1. bhandwaldar2025contrastive - US Patent App. 18/679,009
  2. giannone2025generative - US Patent 20240152879A1
  3. srivastava2024simulator - US Patent 11915121
  4. wang2025post - US Patent App. 20250149455A1

Solution: Visit URLs in browser, use “Print to PDF” or download button

Paywalled Papers (7 folders)

Require institutional access or author contact:

  1. elngar2021deep - Semantic Scholar (search alternative sources)
  2. de1997different - Springer (1997, requires library access)
  3. heyrani2022links - ASME Digital Collection (subscription)
  4. meyer2026unlocking - Oxford Academic NAR (check open access)
  5. singh2020single - Springer chapter (contact author)
  6. srivastava2014burst - Edinburgh Research Explorer
  7. srivastava2019deep - PhD Thesis, University of Edinburgh

Solution: Use institutional access, Google Scholar, ResearchGate, or contact authors

Technical Implementation

Tools Used

Publisher URL Patterns Handled

Successfully converted HTML landing pages to direct PDF URLs for:

Image Specifications

Files Created

Generated Thumbnails

/Users/akash/akashgit.github.io/publications-data/
├── chen2023analyzing/
│   ├── figure_1.png
│   └── figure_2.png
├── giannone2023aligning/
│   ├── figure_1.png
│   └── figure_2.png
... (13 folders × 2 images = 26 files)

Documentation & Scripts

/Users/akash/akashgit.github.io/publications-data/
├── THUMBNAIL_GENERATION_SUMMARY.md (detailed report)
└── _thumbnail_generation_report_v2.json (JSON data)

/tmp/
├── generate_thumbnails_v2.py (main script)
├── convert_manual_pdfs.sh (helper for manual downloads)
├── verify_thumbnails.sh (verification script)
└── manual_publications.md (manual download guide)

Next Steps for Manual Downloads

Quick Commands

After manually downloading PDFs to their folders:

# Convert all manual PDFs at once
/tmp/convert_manual_pdfs.sh

# Or convert individual PDF
cd /Users/akash/akashgit.github.io/publications-data/FOLDER_NAME
pdftoppm -png -f 1 -l 1 -scale-to-x 800 -scale-to-y -1 YOUR_PDF.pdf temp_page
mv temp_page-01.png figure_1.png
cp figure_1.png figure_2.png

For Patents (1-4):

  1. Visit patent URL in web browser
  2. Look for PDF download button or use browser’s Print → Save as PDF
  3. Save to publication folder
  4. Run conversion script

For Recent Papers (5, 7-10):

  1. Search paper title on Google Scholar
  2. Look for “PDF” links from author websites or arXiv
  3. Check ResearchGate for author-uploaded preprints
  4. If available through institution, use library access

For Edinburgh Archives (10-11):

  1. Try direct PDF URLs from repository
  2. Contact University of Edinburgh library
  3. For thesis (srivastava2019deep), check author’s website

For Old Papers (6):

  1. Check if worthwhile for 1997 paper
  2. Consider placeholder image if unavailable
  3. Use library interlibrary loan if needed

Statistics

Quality Verification

All 13 generated thumbnails have been visually verified:

Conclusion

Successfully automated thumbnail generation for the majority of publications using publisher-specific URL patterns. The remaining 11 publications require manual intervention due to legitimate access restrictions (patents, paywalls, institutional archives), but helper scripts are provided to streamline conversion once PDFs are obtained.

All generated thumbnails meet quality standards and are ready for deployment on the publications page.


Generated: February 8, 2026
Script Version: v2
Location: /Users/akash/akashgit.github.io/publications-data/