Agents should execute whenever possible — runtime for composable AI agent skills https://gfernandf.github.io/agent-skills/
  • Python 99.3%
  • TypeScript 0.3%
  • JavaScript 0.1%
Find a file
2026-06-19 20:48:41 +02:00
.agent-skills Expand OpenAPI fallback mappings and add cognitive stability tooling 2026-06-07 15:58:03 +02:00
.github Close production-readiness gaps: full contract coverage and PR gate enforcement 2026-06-19 20:48:41 +02:00
bindings fix: align reasoning.problem.decompose bindings to exclude gaps/overlaps for protocol equivalence 2026-06-19 10:22:27 +02:00
cli Fix activation wiring and improve first-run onboarding 2026-06-03 15:16:00 +02:00
customer_facing runtime+gate: harden versioning, provenance checks, and readiness controls 2026-06-02 11:51:36 +02:00
customization fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
demos/decision-agent chore: align examples with canonical capability ids 2026-05-28 10:03:33 +02:00
docs docs+windows: harden console output and align runtime docs 2026-06-19 20:32:00 +02:00
examples fix(ci): align plan synthesis bindings for protocol equivalence 2026-05-12 11:48:43 +02:00
experiments docs+windows: harden console output and align runtime docs 2026-06-19 20:32:00 +02:00
gateway fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
helm/agent-skills security: RBAC enforced by default, TLS docs hardened, OpenAPI spec v1.1.0 2026-03-30 16:22:51 +02:00
official_mcp_servers Align governance smoke/contracts with current registry capabilities 2026-06-10 10:14:22 +02:00
official_services Close production-readiness gaps: full contract coverage and PR gate enforcement 2026-06-19 20:48:41 +02:00
policies migrate runtime to memory.record.store and align readiness docs 2026-06-19 09:18:22 +02:00
proto feat: complete roadmap phases 1-4 — enterprise, standards, ecosystem 2026-03-28 17:37:52 +01:00
providers fix: apply ruff format to resolve CI lint format check 2026-06-19 10:43:17 +02:00
runtime feat: close governance and cognitive production gaps 2026-06-19 08:54:02 +02:00
sdk style: apply ruff formatting for cleanup branch 2026-05-28 10:07:38 +02:00
services/official feat: OpenAPI Phase 1 - Complete 7/7 capabilities with real service (local) bindings 2026-06-11 13:03:55 +02:00
skills feat(agent-orchestration): add pipeline bindings, runtime hardening, and execution fixes 2026-05-12 09:46:27 +02:00
test_inputs text.* domain: new bindings (generate, transform, response.extract, embed/openai) + baselines + test harness improvements 2026-03-24 10:20:31 +01:00
test_results fix: ruff lint — move urllib imports to top, fix F401/F541, apply ruff format to 14 files 2026-04-27 17:46:20 +02:00
tooling Close production-readiness gaps: full contract coverage and PR gate enforcement 2026-06-19 20:48:41 +02:00
.dockerignore feat(infra): webhooks, plugin system, Docker, CLI serve 2026-03-25 15:55:04 +01:00
.env.example feat: LLM-powered scaffold wizard with manual fallback 2026-03-30 16:32:33 +02:00
.gitignore chore: ignore generated test sheet csv output 2026-06-02 11:57:26 +02:00
.pre-commit-config.yaml feat: production readiness — CITATION.cff, SECURITY.md, CI badge, Makefile, FUNDING 2026-03-28 23:36:11 +01:00
_run_tests.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
_runner.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
agent-skills fix: rename input_text → text in scaffold, docstrings, and MCP template 2026-03-29 11:05:31 +02:00
CHANGELOG.md fix: package tooling in published wheel 2026-04-27 20:19:37 +02:00
check_capabilities.py docs+windows: harden console output and align runtime docs 2026-06-19 20:32:00 +02:00
CITATION.cff fix: package tooling in published wheel 2026-04-27 20:19:37 +02:00
CODE_OF_CONDUCT.md docs: README, CHANGELOG, community artifacts, runtime hardening 2026-03-25 15:56:26 +01:00
CONTRIBUTING.md docs: tighten governance metadata for adoption trust 2026-04-28 10:23:59 +02:00
docker-compose.yml security: RBAC enforced by default, TLS docs hardened, OpenAPI spec v1.1.0 2026-03-30 16:22:51 +02:00
Dockerfile feat: production readiness audit — 22 improvements across 12 phases 2026-03-25 16:58:29 +01:00
fix_bindings.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
input.json feat: bindings and services for research.generate-briefing skill 2026-03-21 18:12:10 +01:00
input_synthesize_example.json chore: model tuning, test fixtures, scheduler tests, gitignore 2026-03-18 17:38:50 +01:00
input_synthesize_paragraph.json chore: model tuning, test fixtures, scheduler tests, gitignore 2026-03-18 17:38:50 +01:00
LICENSE phase-1: packaging, licensing and documentation 2026-03-23 17:48:00 +01:00
Makefile chore: add consolidated pre-push full runner 2026-05-27 13:46:31 +02:00
MANIFEST.in feat: PyPI-ready packaging + README restructure 2026-03-29 00:12:36 +01:00
mcp_err.txt docs(readme): rewrite positioning and ORCA value narrative 2026-05-25 15:20:06 +02:00
mcp_out.txt docs(readme): rewrite positioning and ORCA value narrative 2026-05-25 15:20:06 +02:00
mkdocs.yml fix ci lint and mkdocs nav parsing 2026-05-27 15:52:49 +02:00
ORCA.md docs: add ORCA research paper with full academic integration 2026-04-06 20:41:20 +02:00
orca.png feat: add ORCA logo to README 2026-03-30 20:28:38 +02:00
process_report.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
pyproject.toml fix(runtime): resolve ask input bug, tighten pytest discovery, and normalize formatting 2026-06-02 16:54:12 +02:00
README.md fix: sync README capabilities badge (184) with registry stats 2026-06-19 10:46:36 +02:00
rename_capabilities.py docs+windows: harden console output and align runtime docs 2026-06-19 20:32:00 +02:00
requirements-lock.txt feat: add dependency pinning (requirements-lock.txt) 2026-03-26 17:25:36 +01:00
ROADMAP.md feat: complete roadmap phases 1-4 — enterprise, standards, ecosystem 2026-03-28 17:37:52 +01:00
SECURITY.md docs: tighten governance metadata for adoption trust 2026-04-28 10:23:59 +02:00
setup.cfg feat: audit cycle 3 — 19 MEDIA items across 7 phases (F1-F7) + E2E tests 2026-03-25 17:34:29 +01:00
skills.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
test_adversarial_safety.py fix: ruff check + format across 26 files (CI lint green) 2026-03-28 16:46:40 +01:00
test_alternatives_evaluated.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_alternatives_simple.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_atomic_properties.py Add governance MCP production slice 2026-06-09 16:12:50 +02:00
test_binding_contracts.py feat: close governance and cognitive production gaps 2026-06-19 08:54:02 +02:00
test_capabilities_batch.py migrate runtime to memory.record.store and align readiness docs 2026-06-19 09:18:22 +02:00
test_cognitive_capabilities_e2e.py feat(cognitive): production baseline bindings and CI quality gates 2026-05-25 20:15:33 +02:00
test_cognitive_semantic_all.py fix: close ci lint and smoke regressions 2026-06-04 12:55:54 +02:00
test_cognitive_semantic_golden.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_confidence_calibration_production.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_confidence_redesign.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_consumer_facing_openapi_spec.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_customer_facing_error_status_helpers.py fix(api): enforce async run error status contract 2026-05-26 19:42:08 +02:00
test_customer_transport_tenancy.py feat(policy-tenancy): enforce same_tenant across runtime/transports with matrix coverage 2026-05-28 16:17:14 +02:00
test_decision_make_audit_contract.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_decision_make_skill.py chore: align examples with canonical capability ids 2026-05-28 10:03:33 +02:00
test_e2e_all_phases.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_execution_reliability_confidence.py fix(runtime): enforce final confidence calibration across sync/async channels 2026-05-22 09:49:43 +02:00
test_fuzz_expressions.py fix: resolve remaining CI failures — fuzz tests, openapi scenario, trivy SBOM, registry pin 2026-03-26 13:07:10 +01:00
test_local_capabilities.py fix: ruff check + format across 26 files (CI lint green) 2026-03-28 16:46:40 +01:00
test_mcp_server.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_model_baselines.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
test_multipurpose_agent_capabilities.py chore: stabilize e2e suite and migrate legacy capability aliases 2026-05-27 13:27:16 +02:00
test_neutral_api_slice2.py runtime+gate: harden versioning, provenance checks, and readiness controls 2026-06-02 11:51:36 +02:00
test_new_skills.py fix(lint): corregir errores ruff y aplicar formato; incluir cambios pendientes de sesiones anteriores 2026-05-14 16:19:25 +02:00
test_openapi_runtime_guardrails.py chore: align examples with canonical capability ids 2026-05-28 10:03:33 +02:00
test_option_integrity_phase1.py fix CI lint by applying required ruff formatting 2026-05-27 15:56:32 +02:00
test_performance_baselines.py fix: benchmark importorskip guard + batch test soft assertion 2026-03-26 17:32:10 +01:00
test_phase_a_features.py fix: ruff check + format across 26 files (CI lint green) 2026-03-28 16:46:40 +01:00
test_phase_bc_features.py fix: ruff check + format across 26 files (CI lint green) 2026-03-28 16:46:40 +01:00
test_protocol_equivalence.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00
test_security_safe_summarize_skill.py chore: stabilize e2e suite and migrate legacy capability aliases 2026-05-27 13:27:16 +02:00
test_skill_authoring.py chore: prefer canonical capability ids in runtime references 2026-05-28 09:59:12 +02:00
test_web_source_search_baseline.py fix(lint): corregir errores ruff y aplicar formato; incluir cambios pendientes de sesiones anteriores 2026-05-14 16:19:25 +02:00
validate_bindings.py fix: resolve all CI failures — lint, format, bindings, tests, container-security 2026-03-26 10:45:54 +01:00

Agent Skills

Agent Skills Runtime

Agents should execute whenever possible.

PyPI License CI Python Capabilities Skills DOI

Agent Skills turns repeatable agent reasoning into executable skills: reusable, testable, observable, and portable across tools and model providers.

Stop rebuilding agent logic in prompts. Define it once as a skill, bind it to any backend, and run it with full traceability.

Agent Skills Runtime is the reference implementation of ORCA (Open Cognitive Runtime Architecture):

  • Skills package reusable cognitive workflows
  • Capabilities define backend-agnostic contracts
  • Bindings connect contracts to execution backends (PythonCall, OpenAPI, MCP, OpenRPC)
  • Runtime executes DAGs with policy/safety, CognitiveState, and traceability

No API key required for local-first runs. Deterministic Python baselines are available for offline development and testing.


The problem

Most agent systems still encode critical logic inside prompts and framework glue.

That creates recurring engineering pain:

  • Reasoning logic gets trapped in prompt text instead of executable workflows
  • Workflows are hard to reuse and harder to test
  • Contracts between steps are implicit and brittle
  • Observability and auditability are often an afterthought
  • Safety and governance controls are inconsistent
  • Switching providers or frameworks usually means rewriting too much

The ORCA answer

ORCA introduces an execution layer for cognitive workflows:

  • Skills are reusable cognitive workflows
  • Capabilities are stable, contract-driven interfaces
  • Bindings are interchangeable execution backends
  • Runtime is a DAG scheduler + policy engine + cognitive state + trace

This keeps reasoning structure explicit and executable, while preserving portability across backends.


Before / after

Before (logic trapped in prompt text):

prompt = """
Analyze this PR.
Find risks.
Estimate confidence.
Suggest fixes.
Return JSON.
"""

After (logic as a reusable skill graph):

# Conceptual example (illustrative structure)
skill: code.pr.review
steps:
  - parse_diff
  - detect_risks
  - score_confidence
  - generate_review
  - validate_output

Same reasoning pattern. Reusable. Testable. Observable. Bindable to Python, OpenAPI, MCP, or your own APIs.


Try it locally in 3 minutes

git clone https://github.com/gfernandf/agent-skills.git
cd agent-skills
make bootstrap
python skills.py doctor
python skills.py run text.language-summary \
  --input '{"text": "ORCA turns agent reasoning into reusable executable skills."}'

What to expect:

  • No API key required
  • Runs offline with deterministic Python baselines
  • First run may take 30-60 seconds
Windows PowerShell setup and run
git clone https://github.com/gfernandf/agent-skills.git
cd agent-skills
pip install -e ".[all,dev]"
git clone https://github.com/gfernandf/agent-skill-registry.git ../agent-skill-registry
python skills.py doctor

$env:OPENAI_API_KEY = ""
'{ "text": "ORCA turns agent reasoning into reusable executable skills." }' | Set-Content input_qs.json -Encoding ascii
python skills.py run text.language-summary --input-file input_qs.json
Remove-Item input_qs.json

Why this matters beyond a toy summary

The first command verifies install. The stronger demo is the official skill decision.make.

decision.make shows a full decision workflow under uncertainty with explicit stages and auditable outputs.

From the skill contract in the registry, it includes:

  • Multi-step pipeline (option generation, analysis, scoring, justification, validation)
  • Structured outputs such as recommendation, tradeoffs, confidence_score, confidence_level, uncertainties, and next_steps
  • Risk-aware reasoning through explicit criteria and constraints
  • Trace-friendly execution aligned with ORCA observability goals

Conceptual output shape for decision-style workflows:

{
  "recommendation": "Proceed with a controlled pilot",
  "confidence_score": 0.82,
  "confidence_level": "medium",
  "tradeoffs": [
    "Faster learning, higher short-term operational overhead"
  ],
  "uncertainties": [
    "Regulatory timeline may change in Q3"
  ],
  "next_steps": [
    "Run a 6-week pilot in one segment"
  ],
  "trace_id": "..."
}

Note: the JSON above is illustrative. Exact outputs depend on input context, bindings, and policy settings.


I want to...

I want to try it

  • Start with local CLI: see Try it locally in 3 minutes
  • Use deterministic baselines for offline reproducibility
  • Explore first workflows in examples and docs

I want to integrate it

Choose one integration surface:

  • Embedded SDK (lowest latency, in-process)
  • HTTP API (service boundary, non-Python clients)
  • MCP server (tooling ecosystems and MCP hosts)
  • Framework adapters (LangChain, CrewAI, AutoGen, Semantic Kernel)
  • Native tool definitions (Anthropic, OpenAI, Gemini)

I want to build skills

  • Author declarative skills as DAG workflows
  • Reuse existing capability contracts
  • Validate wiring and execution behavior
  • Package and contribute reusable workflows

Mental model

Think of Agent Skills as:

  • Capabilities: what an operation means (contract)
  • Bindings: how that operation is executed (backend)
  • Skills: how operations compose into workflows (DAG)
  • Runtime: how workflows execute safely and observably

Cognitive Taxonomy

The pure cognitive layer is intentionally narrower than the full runtime. The current taxonomy separates:

  • Pure cognitive capabilities: decision, evaluation, evidence, memory, perception, and reasoning.
  • Compatibility surfaces: legacy or transitional names such as eval.* that remain in the live registry during migration.
  • Operational capabilities: routing, delegation, workflow control, and other runtime helpers that should not be counted as core cognition.

The registry-level reference for that taxonomy is:

Use that document as the source of truth when deciding whether a capability belongs to the cognitive core or to the operational layer.


Core concepts

Skills

Reusable cognitive workflows declared as DAGs.

Capabilities

Backend-agnostic contracts with typed inputs and outputs.

Bindings

Execution adapters for PythonCall, OpenAPI, MCP, and OpenRPC.

Runtime

Execution layer with DAG scheduling, policy gates, cognitive state, and trace.


Integration modes

Mode Best for Requires server?
Embedded SDK Python apps and notebooks No
Native tool defs Direct model SDK integration No
Framework adapters Existing agent frameworks No
MCP server MCP-compatible hosts MCP host
HTTP API Service-oriented architectures Yes

Embedded SDK (example)

from sdk.embedded import as_langchain_tools

tools = as_langchain_tools(["text.content.summarize", "text.content.translate"])

HTTP API (example)

agent-skills serve
curl http://localhost:8080/v1/health
curl -X POST http://localhost:8080/v1/skills/text.language-summary/execute \
  -H "Content-Type: application/json" \
  -d '{"inputs": {"text": "Hello world from ORCA"}}'

MCP server (example)

python -m official_mcp_servers
python -m official_mcp_servers --sse --host 0.0.0.0 --port 8765

Native tool definitions (example)

from sdk.embedded import as_openai_tools, execute_openai_tool_call

tools = as_openai_tools()
# pass tools to your OpenAI client, then dispatch tool calls via execute_openai_tool_call

Architecture

graph TB
    subgraph Interface
        CLI[CLI]
        HTTP[HTTP API]
        SDK[Embedded SDK / Adapters]
        MCP[MCP Server]
    end

    subgraph Runtime
        GW[Gateway]
        SCH[DAG Scheduler]
        POL[Policy and Safety]
        COG[CognitiveState]
        TRC[Trace and Audit]
    end

    subgraph BindingLayer
        RES[Binding Resolver]
        PY[PythonCall]
        OA[OpenAPI]
        MP[MCP]
        RP[OpenRPC]
    end

    subgraph Backends
        BASE[Deterministic Python baselines]
        EXT[External APIs and services]
        MCPB[MCP backends]
    end

    CLI --> GW
    HTTP --> GW
    SDK --> GW
    MCP --> GW

    GW --> SCH
    SCH --> POL
    SCH --> COG
    SCH --> TRC
    POL --> RES

    RES --> PY --> BASE
    RES --> OA --> EXT
    RES --> MP --> MCPB
    RES --> RP --> EXT

How it compares

Agent Skills is not a replacement for every agent framework.

It can run standalone, but its strongest use case is as a reusable execution layer underneath frameworks, tools, and model providers.

Dimension Agent Skills Typical agent framework
Execution model Declarative DAG skills Often prompt/tool loop centered
Contracts Capability-first, typed Usually app-level conventions
Backend portability Binding abstraction layer Often provider/framework specific
Safety/governance Policy gates and execution controls Varies widely
Observability Trace and audit oriented Varies widely
Local deterministic mode Yes, baseline-first workflow Often key-dependent

Advanced features

  • Auth and RBAC controls
  • Webhook eventing
  • Plugin extension points
  • Audit modes and runtime observability
  • CognitiveState v1 and cognitive hints
  • Runtime-managed output envelope (status, rationale, trace_ref)
  • JSON Schema generation and validation
  • Skill governance and conformance tooling

Cognitive quality gates (>9)

The runtime includes a quality gate bundle for pure cognitive capabilities.

Run the gate pack:

python tooling/run_cognitive_quality_gates.py \
  --report-file artifacts/cognitive_quality_gates_local_report.json

Generate scorecard only:

python tooling/generate_cognitive_quality_scorecard.py \
  --fail-on-threshold \
  --min-axis 9.0 \
  --min-overall 9.0

Primary artifacts:

  • artifacts/cognitive_e2e_contract_report.json
  • artifacts/cognitive_semantic_all_report.json
  • artifacts/cognitive_quality_scorecard.json
  • artifacts/cognitive_quality_gates_local_report.json

See docs index below for details.


Documentation

Topic Link
10-minute onboarding docs/ONBOARDING_10_MIN.md
Target architecture (canonical) docs/TARGET_ARCHITECTURE.md
Installation docs/INSTALLATION.md
Environment variables docs/ENVIRONMENT_VARIABLES.md
Error taxonomy docs/ERROR_TAXONOMY.md
Runner architecture docs/RUNNER_GUIDE.md
Binding selection policy docs/BINDING_SELECTION.md
Binding authoring guide docs/BINDING_GUIDE.md
DAG scheduler docs/SCHEDULER.md
Step control flow docs/STEP_CONTROL_FLOW.md
Streaming docs/STREAMING.md
Async execution docs/ASYNC_EXECUTION.md
Deployment docs/DEPLOYMENT.md
Observability docs/OBSERVABILITY.md
Auth and RBAC docs/AUTH.md
Webhooks docs/WEBHOOKS.md
Plugins docs/PLUGINS.md
JSON schemas docs/JSON_SCHEMAS.md
Skill authoring docs/SKILL_AUTHORING.md
Troubleshooting docs/TROUBLESHOOTING.md
Public release use cases docs/PUBLIC_RELEASE_USE_CASES.md
Project status docs/PROJECT_STATUS.md
ORCA specification ORCA.md

Serve docs locally:

make serve

Research paper

Beyond Prompting: Decoupling Cognition from Execution in LLM-based Agents through the ORCA Framework

Fernandez Alvarez, G. E. (2026)


Contributing

Contributions are welcome. See CONTRIBUTING.md.

make check

License

Apache 2.0. See LICENSE.


Citing

If you use Agent Skills or ORCA in research, please cite:

@article{fernandez_orca_2026,
  author    = {Fernandez Alvarez, Guillermo E.},
  title     = {Beyond Prompting: Decoupling Cognition from Execution in
               LLM-based Agents through the ORCA Framework},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19438943},
  url       = {https://doi.org/10.5281/zenodo.19438943}
}

Software citation:

@software{fernandez_agent_skills_2026,
  author       = {Fernandez Alvarez, Guillermo},
  title        = {Agent Skills Runtime},
  year         = {2026},
  url          = {https://github.com/gfernandf/agent-skills},
  version      = {1.0.2},
  license      = {Apache-2.0}
}

See also CITATION.cff.


Troubleshooting

Problem Solution
Registry not found Run doctor and ensure agent-skill-registry is cloned next to this repo
Command not found on Windows Use python skills.py ... from repo root
Unexpected runtime error Check docs/ERROR_TAXONOMY.md
Environment mismatch Review docs/ENVIRONMENT_VARIABLES.md