Methodology
Our transparent, rule-based approach to analyzing research integrity
Rule-Based COI Detection Algorithm
The Research Integrity Analyzer uses a transparent algorithm based on explicit rules to evaluate conflicts of interest (COI) in scientific papers. Our methodology combines semantic extraction through advanced AI models with fixed rules and thresholds, following international standards (ICMJE, COPE, WAME, CONSORT, PRISMA, DOAJ).
Detection Algorithm Flow
1 Ingestion and Preprocessing
The system extracts and cleans the scientific document content using pypdf:
- Plain text extraction: title, authors, affiliations, main sections
- PDF metadata parsing: embedded author, journal, and ISSN information
- Structure detection: Abstract, Introduction, Methods, Results, Discussion, Conclusions
- Key section identification: Conflict of Interest, Funding, Acknowledgements
- Artifact cleaning and text normalization
2 Semantic Extraction with AI
The language model identifies structured textual facts:
Identifies
- Author and institution names
- Potential funders and sponsors
- Explicit/implicit COI declarations
- Fragments about funding and sponsors
- Journal name and publisher
Detects
- Language patterns (promotional vs critical)
- Presence/absence of limitations section
- Companies, foundations, organizations
- Relationships between authors and sponsors
- Commercial vs academic affiliations
! Predatory Journal Detection Module
A specialized module runs in parallel to detect potential predatory publishing practices:
1. Metadata Extraction
The system extracts specific metadata from the PDF:
- ISSN (International Standard Serial Number)
- Journal Name (Normalized)
- Publisher Name
2. Database Matching
Cross-references against curated databases:
- Beall's List (Archived)
- PredatoryJournals.org
- Community Verified Additions
3. External Verification (AI Internet Scan)
If the internal database match is inconclusive, a second-layer AI scan is triggered:
- Searches open web for predatory signals (e.g., fast review times, spam complaints)
- Verifies against online watchlists and indexes
- Provides a confidence score and evidence summary
D Data Sources & Access
We believe in transparency. Our predatory journal detection relies on open databases.
External Resources
Open Data
Download our full aggregated database of predatory journals used in the analysis.
Download Full Database (CSV)C Community Verification System
Our database grows stronger with every contribution from the scientific community.
How It Works
- Submit: Report suspicious journals from your analysis results
- Review: Verified community members review submissions
- Approve: Requires 2 verifier approvals to add to database
- Notify: Submitters are notified of approval/rejection
Become a Verifier
Researchers can apply to become verified community reviewers. Approved verifiers gain the ability to vote on pending journal submissions.
- Verifiers: 1 vote per approval
- Apply via Settings → Verifier Application
3 Rule Application by Dimensions
From the extracted facts, scores 0-100 are calculated for each of the 5 dimensions. Each dimension contributes 20% to the final score.
Checks for presence of COI and Funding disclosure sections
- Missing COI section → +100 points
- Missing Funding section → +50 points
- Both present with clear disclosures → Low risk
- Vague or generic declarations → Medium risk
Detects commercial keywords in funding sources
- Commercial sponsor keywords ("pharma", "inc", "corp") → +80 points
- Academic/public funding only → Low risk
- No identifiable sponsor → Medium risk
- Sponsor + favorable results + promotional language → High risk
Analyzes author affiliations for commercial connections
- Commercial affiliation detected → +60 points
- Authors employed by funding company → High risk
- Diverse academic affiliations → Low risk
- Missing or generic affiliations → Medium risk
Checks against predatory journal database
- Predatory Journal Detected → Score forced to 100 (CRITICAL)
- Signals of predatory practices in text → High risk
- Evidence of peer review policies → Low risk
- Insufficient journal information → Medium risk
Comprehensive AI-powered bias analysis
- Promotional words (16 keywords: "miracle", "breakthrough", "revolutionary", etc.) → +40 points
- Missing limitations section → +30 points
- Statistical red flags (p-hacking, boundary significance) → +35 points
- LLM Deep Analysis: One-sided reporting, self-citation concerns
- Balanced, sober language with limitations → Low risk
- Multiple bias indicators combined → High risk (80+)
4 Global Score Calculation
The global score and risk level are calculated as the weighted average:
LOW RISK
Minimal concerns
MEDIUM RISK
Some issues warrant attention
HIGH RISK
Significant concerns
5 Report Generation
The system generates a comprehensive report with AI-powered summaries:
AI-generated overview
Risk score & dimensions
PDF, DOCX, Certificates
Role of the AI Model
The AI does not decide risk levels by intuition. Its specific functions are:
1. Extract
Information from text
(metadata, funding, affiliations)
2. Map
Findings to predefined rules
(scoring logic applied)
3. Summarize
Generate executive summary
(human-readable report)
⌨ Keyboard Shortcuts
Navigate the application quickly using keyboard shortcuts:
| Shortcut | Action |
|---|---|
| ⌘/Ctrl + U | Go to Dashboard (Upload) |
| ⌘/Ctrl + H | Go to History |
| ⌘/Ctrl + , | Go to Settings |
| Shift + ? | Show shortcuts help |
| ⌘/Ctrl + P | Print current page |
Current Algorithm Limitations
- Based solely on the available paper's textual content
- No access to external COI forms, trial registries, or unpublished databases
- Predatory journal detection is based on known lists (may not cover new journals)
- The algorithm indicates COI risk, does not prove its legal existence
- Tool for critical reading and research integrity, not a court of truth
- Results should be verified by human experts for critical decisions
Privacy & GDPR Compliance
We are committed to protecting your data. Our platform is fully GDPR compliant.
What We Store
- Analysis results and metadata
- User preferences and settings
- Activity logs for compliance
What We Don't Store
- Full PDF file contents permanently
- Personal data beyond what you provide
- Third-party tracking or analytics