September 19, 2025 2 min to read

Mining Github for CVEs!

Tech Talk: Mining GitHub for CVE Research with an Enhanced Vulnerability Scanner

Abstract

Security research often begins with patterns — where do vulnerabilities come from, what practices make them more likely, and how can we spot them earlier? In this talk, I’ll walk through how I built an OSINT-powered vulnerability scanner for GitHub repositories that blends CVE trend analysis, Semgrep static analysis, and repository health metrics into a unified framework. You’ll see how this approach helped me identify bug-prone projects, optimize scanning for scale, and improve my own CVE research workflow.

Talk Outline

1. Why GitHub OSINT for Vulnerability Research (5 min)

Repositories as living ecosystems of security practices (or mispractices).
Quick case study: One of my CVEs came from spotting a repo with weak security hygiene and poor code review patterns.

2. Building the Enhanced Vulnerability Scanner (10 min)

Motivation: traditional scanners miss “social signals” (maintenance, documentation, contributor activity).
Architecture:
- GitHub OSINT collector → Metadata & repo stats.
- Semgrep engine → Static analysis for known bug patterns.
- Scoring framework → Combines repository health, security hygiene, development practices, and vulnerability findings.
Optimization Challenges:
- API rate limits (5k/hour with token).
- Tradeoffs between data-only mode vs deep static analysis.
- Why I added a two-stage process: screen wide, then deep dive.

3. Demo: From 1000 Repos to a Shortlist of Vulnerability Candidates (10 min)

Stage 1: Run –skip-semgrep mode → Fast OSINT-based filtering.
Show results in CSV with scores (health, security, development).
Stage 2: Feed top repos into Semgrep-only scan.
Show Semgrep findings → e.g., insecure deserialization, weak crypto, bad auth checks.
Highlight how the scoring system prioritizes “most promising for bugs.”

4. Real Case Study: Turning OSINT into a CVE (5 min)

Walk through a repo where your scanner surfaced weak security hygiene.
Show how a Semgrep finding (or manual review after OSINT) led to a bug.
Tie it back to published CVE — explaining the research path from GitHub metadata → pattern spotting → vulnerability discovery.

5. Lessons Learned & Future Work (5 min)

Patterns matter: poor repo health often predicts bad security practices.
Automation helps: large-scale scanning + static analysis = faster trend detection.
Research directions:
- Expanding beyond GitHub (GitLab, Bitbucket, DockerHub).
- Enriching scoring with commit-level NLP (e.g., security-related commit messages).
- Mapping findings to MITRE ATT&CK or CWE categories.
- Use of LLMs and MCPs

CyberDucky Quack

Mining Github for CVEs!

Tech Talk: Mining GitHub for CVE Research with an Enhanced Vulnerability Scanner

Abstract

Talk Outline

1. Why GitHub OSINT for Vulnerability Research (5 min)

2. Building the Enhanced Vulnerability Scanner (10 min)

3. Demo: From 1000 Repos to a Shortlist of Vulnerability Candidates (10 min)

4. Real Case Study: Turning OSINT into a CVE (5 min)

5. Lessons Learned & Future Work (5 min)

CVE-2025-64115: Unvalidated Referer Redirect & SSRF in Movary

Juan Soberanes

Comments

Mining Github for CVEs!

Tech Talk: Mining GitHub for CVE Research with an Enhanced Vulnerability Scanner

Abstract

Talk Outline

1. Why GitHub OSINT for Vulnerability Research (5 min)

2. Building the Enhanced Vulnerability Scanner (10 min)

3. Demo: From 1000 Repos to a Shortlist of Vulnerability Candidates (10 min)

4. Real Case Study: Turning OSINT into a CVE (5 min)

5. Lessons Learned & Future Work (5 min)

CVE-2025-64115: Unvalidated Referer Redirect & SSRF in Movary

Share

Juan Soberanes

Comments