@happyvertical/smrt-facts

Knowledge base with semantic deduplication, provenance tracking, evolution chains, and confidence scoring.

v0.20.44Semantic DedupEvolution ChainsConfidence

Overview

smrt-facts provides a distributed knowledge base where facts are atomic units of knowledge with provenance tracking. Facts evolve through parent-child chains, undergo 3-zone semantic reconciliation to prevent duplicates, and carry confidence scores computed from source credibility.

Installation

bash
npm install @happyvertical/smrt-facts

Quick Start

typescript
import {
  Fact, FactCollection,
  FactSource, FactSourceCollection,
  FactSubject, FactSubjectCollection,
  calculateConfidence, normalizeText,
} from '@happyvertical/smrt-facts';

// Create a fact with provenance
const facts = new FactCollection(db);
const fact = await facts.create({
  textRefined: 'The Eiffel Tower is 330 meters tall',
  type: 'measurement',
  domain: 'landmarks',
  status: 'active',
});

// Attach a source with credibility score
const sources = new FactSourceCollection(db);
await sources.create({
  factId: fact.id,
  sourceUrl: 'https://example.com/eiffel-tower',
  sourceTitle: 'Tourism Board',
  credibility: 0.9,
});

// Recalculate confidence from all sources
await facts.recalculateConfidence(fact.id);

// 3-zone semantic reconciliation
const result = await facts.reconcile({
  rawInput: 'The Eiffel Tower stands 330m tall',
  type: 'measurement',
  domain: 'landmarks',
  source: { sourceUrl: 'https://another-source.com', credibility: 0.8 },
});
// result.action: 'created' | 'merged' | 'branched'

// Evolution chains: branch creates a child linked via parentId
const child = await facts.branch(fact.id, {
  textRefined: 'The Eiffel Tower is 330 meters tall including the antenna',
}, 'correction');

// Walk evolution: root -> current
const chain = await facts.getEvolutionChain(child.id);
const latest = await facts.getLatestInChain(fact.id);
const tree = await facts.getEvolutionTree(fact.id);

// Entity briefing: all facts for a given entity
const briefing = await facts.getEntityBriefing('Place', placeId);

Core Models

Fact

typescript
class Fact extends SmrtObject {
  textRefined: string         // Cleaned knowledge statement
  type: string                // assertion/observation/measurement/definition/...
  domain?: string
  status: 'pending' | 'active' | 'disputed' | 'superseded' | 'archived' | 'retracted'
  confidence: number          // 0-1, computed from sources
  parentId?: string           // Evolution chain link

  // Auto-generated embeddings for semantic search
}

FactSource

typescript
class FactSource extends SmrtObject {
  factId: string
  sourceUrl: string
  sourceTitle?: string
  sourceType?: string
  credibility: number         // 0-1
  extractedAt?: Date
}

FactSubject (Polymorphic Entity Link)

typescript
class FactSubject extends SmrtObject {
  factId: string
  entityType: string          // e.g., 'Place', 'Person'
  entityId: string            // Plain string ID (no FK)
  role?: string

  // conflictColumns: ['fact_id', 'entity_type', 'entity_id']
}

Semantic Reconciliation

typescript
// 3-zone similarity thresholds:
//
// >= 0.85: Auto-merge (same fact, update metadata)
// 0.60-0.85: AI disambiguation (model decides merge vs branch)
// < 0.60: Create new fact (no match)
//
// If AI disambiguation fails, defaults to branch (safer than merge)

const result = await facts.reconcile({
  rawInput: 'New fact text to reconcile',
  type: 'assertion',
  domain: 'science',
  source: { sourceUrl: 'https://source.com', credibility: 0.85 },
});

// Confidence formula (clamped 0-1):
// base 0.5
// + source volume (max 0.3)
// + avg credibility (0.2)
// + recency (0.1, decays over 10 days)
// + corroboration (0.1)

Best Practices

DOs

  • Use reconcile() to prevent duplicate facts
  • Attach sources with credibility scores for accurate confidence
  • Use evolution chains for corrections and refinements
  • Call recalculateConfidence() after adding new sources
  • Use findWithGlobals(tenantId) to include global facts

DON'Ts

  • Don't skip reconciliation when ingesting facts (creates duplicates)
  • Don't assume embedding generation blocks fact creation (failures are non-fatal)
  • Don't manually set confidence (use recalculateConfidence())
  • Don't modify metadata fields directly (use getter/setter helpers)
  • Don't create circular evolution chains (traversals use cycle detection)

Related Modules