Methodology

Our transparent, rule-based approach to analyzing research integrity

Rule-Based COI Detection Algorithm

The Research Integrity Analyzer uses a transparent algorithm based on explicit rules to evaluate conflicts of interest (COI) in scientific papers. Our methodology combines semantic extraction through advanced AI models with fixed rules and thresholds, following international standards (ICMJE, COPE, WAME, CONSORT, PRISMA, DOAJ).

Objective: Produce a structured estimation of COI risk and editorial credibility using only the paper's text as source, with reproducible and explainable results.
4,000+
Predatory Journals Indexed
5
Risk Dimensions Analyzed
AI
Powered Analysis

Detection Algorithm Flow

1 Ingestion and Preprocessing

The system extracts and cleans the scientific document content using pypdf:

2 Semantic Extraction with AI

The language model identifies structured textual facts:

Identifies

  • Author and institution names
  • Potential funders and sponsors
  • Explicit/implicit COI declarations
  • Fragments about funding and sponsors
  • Journal name and publisher

Detects

  • Language patterns (promotional vs critical)
  • Presence/absence of limitations section
  • Companies, foundations, organizations
  • Relationships between authors and sponsors
  • Commercial vs academic affiliations
Output: A set of structured textual facts that feed the algorithm rules. The AI extracts objective information—it doesn't make subjective risk decisions.

! Predatory Journal Detection Module

A specialized module runs in parallel to detect potential predatory publishing practices:

1. Metadata Extraction

The system extracts specific metadata from the PDF:

  • ISSN (International Standard Serial Number)
  • Journal Name (Normalized)
  • Publisher Name

2. Database Matching

Cross-references against curated databases:

  • Beall's List (Archived)
  • PredatoryJournals.org
  • Community Verified Additions

3. External Verification (AI Internet Scan)

If the internal database match is inconclusive, a second-layer AI scan is triggered:

  • Searches open web for predatory signals (e.g., fast review times, spam complaints)
  • Verifies against online watchlists and indexes
  • Provides a confidence score and evidence summary
Impact: If a predatory match is found (Internal or External), the paper is flagged as HIGH RISK (Score 80-100), overriding other dimension scores.

D Data Sources & Access

We believe in transparency. Our predatory journal detection relies on open databases.

Open Data

Download our full aggregated database of predatory journals used in the analysis.

Download Full Database (CSV)

C Community Verification System

Our database grows stronger with every contribution from the scientific community.

How It Works

  • Submit: Report suspicious journals from your analysis results
  • Review: Verified community members review submissions
  • Approve: Requires 2 verifier approvals to add to database
  • Notify: Submitters are notified of approval/rejection

Become a Verifier

Researchers can apply to become verified community reviewers. Approved verifiers gain the ability to vote on pending journal submissions.

3 Rule Application by Dimensions

From the extracted facts, scores 0-100 are calculated for each of the 5 dimensions. Each dimension contributes 20% to the final score.

1. Disclosure & Funding Transparency 20%

Checks for presence of COI and Funding disclosure sections

  • Missing COI section → +100 points
  • Missing Funding section → +50 points
  • Both present with clear disclosures → Low risk
  • Vague or generic declarations → Medium risk
2. Funding-Outcome Alignment 20%

Detects commercial keywords in funding sources

  • Commercial sponsor keywords ("pharma", "inc", "corp") → +80 points
  • Academic/public funding only → Low risk
  • No identifiable sponsor → Medium risk
  • Sponsor + favorable results + promotional language → High risk
3. Author-Institution-Sponsor Network 20%

Analyzes author affiliations for commercial connections

  • Commercial affiliation detected → +60 points
  • Authors employed by funding company → High risk
  • Diverse academic affiliations → Low risk
  • Missing or generic affiliations → Medium risk
4. Journal / Editorial Integrity 20%

Checks against predatory journal database

  • Predatory Journal Detected → Score forced to 100 (CRITICAL)
  • Signals of predatory practices in text → High risk
  • Evidence of peer review policies → Low risk
  • Insufficient journal information → Medium risk
5. Textual Bias & Reporting Quality 20%

Comprehensive AI-powered bias analysis

  • Promotional words (16 keywords: "miracle", "breakthrough", "revolutionary", etc.) → +40 points
  • Missing limitations section → +30 points
  • Statistical red flags (p-hacking, boundary significance) → +35 points
  • LLM Deep Analysis: One-sided reporting, self-citation concerns
  • Balanced, sober language with limitations → Low risk
  • Multiple bias indicators combined → High risk (80+)

4 Global Score Calculation

The global score and risk level are calculated as the weighted average:

overall_score = (D1 + D2 + D3 + D4 + D5) / 5
0-33
LOW RISK
Minimal concerns
34-66
MEDIUM RISK
Some issues warrant attention
67-100
HIGH RISK
Significant concerns

5 Report Generation

The system generates a comprehensive report with AI-powered summaries:

Executive Summary
AI-generated overview
Visual Charts
Risk score & dimensions
Export Options
PDF, DOCX, Certificates

Role of the AI Model

The AI does not decide risk levels by intuition. Its specific functions are:

1. Extract

Information from text
(metadata, funding, affiliations)

2. Map

Findings to predefined rules
(scoring logic applied)

3. Summarize

Generate executive summary
(human-readable report)

Keyboard Shortcuts

Navigate the application quickly using keyboard shortcuts:

Shortcut Action
⌘/Ctrl + U Go to Dashboard (Upload)
⌘/Ctrl + H Go to History
⌘/Ctrl + , Go to Settings
Shift + ? Show shortcuts help
⌘/Ctrl + P Print current page

Current Algorithm Limitations

Privacy & GDPR Compliance

We are committed to protecting your data. Our platform is fully GDPR compliant.

What We Store

  • Analysis results and metadata
  • User preferences and settings
  • Activity logs for compliance

What We Don't Store

  • Full PDF file contents permanently
  • Personal data beyond what you provide
  • Third-party tracking or analytics
Your Rights: You can export all your data, view your activity logs, and permanently delete your account at any time via Settings → Data & Privacy.