mirror of https://github.com/gfernandf/agent-skills.git synced 2026-06-23 02:03:16 -06:00

Agents should execute whenever possible — runtime for composable AI agent skills https://gfernandf.github.io/agent-skills/

agents ai capabilities llm orca python skills workflows

Python 99.3%
TypeScript 0.3%
JavaScript 0.1%

Find a file

gfernandf 645cd8a7f7 Close production-readiness gaps: full contract coverage and PR gate enforcement		2026-06-19 20:48:41 +02:00
.agent-skills	Expand OpenAPI fallback mappings and add cognitive stability tooling	2026-06-07 15:58:03 +02:00
.github	Close production-readiness gaps: full contract coverage and PR gate enforcement	2026-06-19 20:48:41 +02:00
bindings	fix: align reasoning.problem.decompose bindings to exclude gaps/overlaps for protocol equivalence	2026-06-19 10:22:27 +02:00
cli	Fix activation wiring and improve first-run onboarding	2026-06-03 15:16:00 +02:00
customer_facing	runtime+gate: harden versioning, provenance checks, and readiness controls	2026-06-02 11:51:36 +02:00
customization	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
demos/decision-agent	chore: align examples with canonical capability ids	2026-05-28 10:03:33 +02:00
docs	docs+windows: harden console output and align runtime docs	2026-06-19 20:32:00 +02:00
examples	fix(ci): align plan synthesis bindings for protocol equivalence	2026-05-12 11:48:43 +02:00
experiments	docs+windows: harden console output and align runtime docs	2026-06-19 20:32:00 +02:00
gateway	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
helm/agent-skills	security: RBAC enforced by default, TLS docs hardened, OpenAPI spec v1.1.0	2026-03-30 16:22:51 +02:00
official_mcp_servers	Align governance smoke/contracts with current registry capabilities	2026-06-10 10:14:22 +02:00
official_services	Close production-readiness gaps: full contract coverage and PR gate enforcement	2026-06-19 20:48:41 +02:00
policies	migrate runtime to memory.record.store and align readiness docs	2026-06-19 09:18:22 +02:00
proto	feat: complete roadmap phases 1-4 — enterprise, standards, ecosystem	2026-03-28 17:37:52 +01:00
providers	fix: apply ruff format to resolve CI lint format check	2026-06-19 10:43:17 +02:00
runtime	feat: close governance and cognitive production gaps	2026-06-19 08:54:02 +02:00
sdk	style: apply ruff formatting for cleanup branch	2026-05-28 10:07:38 +02:00
services/official	feat: OpenAPI Phase 1 - Complete 7/7 capabilities with real service (local) bindings	2026-06-11 13:03:55 +02:00
skills	feat(agent-orchestration): add pipeline bindings, runtime hardening, and execution fixes	2026-05-12 09:46:27 +02:00
test_inputs	text.* domain: new bindings (generate, transform, response.extract, embed/openai) + baselines + test harness improvements	2026-03-24 10:20:31 +01:00
test_results	fix: ruff lint — move urllib imports to top, fix F401/F541, apply ruff format to 14 files	2026-04-27 17:46:20 +02:00
tooling	Close production-readiness gaps: full contract coverage and PR gate enforcement	2026-06-19 20:48:41 +02:00
.dockerignore	feat(infra): webhooks, plugin system, Docker, CLI serve	2026-03-25 15:55:04 +01:00
.env.example	feat: LLM-powered scaffold wizard with manual fallback	2026-03-30 16:32:33 +02:00
.gitignore	chore: ignore generated test sheet csv output	2026-06-02 11:57:26 +02:00
.pre-commit-config.yaml	feat: production readiness — CITATION.cff, SECURITY.md, CI badge, Makefile, FUNDING	2026-03-28 23:36:11 +01:00
_run_tests.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
_runner.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
agent-skills	fix: rename input_text → text in scaffold, docstrings, and MCP template	2026-03-29 11:05:31 +02:00
CHANGELOG.md	fix: package tooling in published wheel	2026-04-27 20:19:37 +02:00
check_capabilities.py	docs+windows: harden console output and align runtime docs	2026-06-19 20:32:00 +02:00
CITATION.cff	fix: package tooling in published wheel	2026-04-27 20:19:37 +02:00
CODE_OF_CONDUCT.md	docs: README, CHANGELOG, community artifacts, runtime hardening	2026-03-25 15:56:26 +01:00
CONTRIBUTING.md	docs: tighten governance metadata for adoption trust	2026-04-28 10:23:59 +02:00
docker-compose.yml	security: RBAC enforced by default, TLS docs hardened, OpenAPI spec v1.1.0	2026-03-30 16:22:51 +02:00
Dockerfile	feat: production readiness audit — 22 improvements across 12 phases	2026-03-25 16:58:29 +01:00
fix_bindings.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
input.json	feat: bindings and services for research.generate-briefing skill	2026-03-21 18:12:10 +01:00
input_synthesize_example.json	chore: model tuning, test fixtures, scheduler tests, gitignore	2026-03-18 17:38:50 +01:00
input_synthesize_paragraph.json	chore: model tuning, test fixtures, scheduler tests, gitignore	2026-03-18 17:38:50 +01:00
LICENSE	phase-1: packaging, licensing and documentation	2026-03-23 17:48:00 +01:00
Makefile	chore: add consolidated pre-push full runner	2026-05-27 13:46:31 +02:00
MANIFEST.in	feat: PyPI-ready packaging + README restructure	2026-03-29 00:12:36 +01:00
mcp_err.txt	docs(readme): rewrite positioning and ORCA value narrative	2026-05-25 15:20:06 +02:00
mcp_out.txt	docs(readme): rewrite positioning and ORCA value narrative	2026-05-25 15:20:06 +02:00
mkdocs.yml	fix ci lint and mkdocs nav parsing	2026-05-27 15:52:49 +02:00
ORCA.md	docs: add ORCA research paper with full academic integration	2026-04-06 20:41:20 +02:00
orca.png	feat: add ORCA logo to README	2026-03-30 20:28:38 +02:00
process_report.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
pyproject.toml	fix(runtime): resolve ask input bug, tighten pytest discovery, and normalize formatting	2026-06-02 16:54:12 +02:00
README.md	fix: sync README capabilities badge (184) with registry stats	2026-06-19 10:46:36 +02:00
rename_capabilities.py	docs+windows: harden console output and align runtime docs	2026-06-19 20:32:00 +02:00
requirements-lock.txt	feat: add dependency pinning (requirements-lock.txt)	2026-03-26 17:25:36 +01:00
ROADMAP.md	feat: complete roadmap phases 1-4 — enterprise, standards, ecosystem	2026-03-28 17:37:52 +01:00
SECURITY.md	docs: tighten governance metadata for adoption trust	2026-04-28 10:23:59 +02:00
setup.cfg	feat: audit cycle 3 — 19 MEDIA items across 7 phases (F1-F7) + E2E tests	2026-03-25 17:34:29 +01:00
skills.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
test_adversarial_safety.py	fix: ruff check + format across 26 files (CI lint green)	2026-03-28 16:46:40 +01:00
test_alternatives_evaluated.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_alternatives_simple.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_atomic_properties.py	Add governance MCP production slice	2026-06-09 16:12:50 +02:00
test_binding_contracts.py	feat: close governance and cognitive production gaps	2026-06-19 08:54:02 +02:00
test_capabilities_batch.py	migrate runtime to memory.record.store and align readiness docs	2026-06-19 09:18:22 +02:00
test_cognitive_capabilities_e2e.py	feat(cognitive): production baseline bindings and CI quality gates	2026-05-25 20:15:33 +02:00
test_cognitive_semantic_all.py	fix: close ci lint and smoke regressions	2026-06-04 12:55:54 +02:00
test_cognitive_semantic_golden.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_confidence_calibration_production.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_confidence_redesign.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_consumer_facing_openapi_spec.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_customer_facing_error_status_helpers.py	fix(api): enforce async run error status contract	2026-05-26 19:42:08 +02:00
test_customer_transport_tenancy.py	feat(policy-tenancy): enforce same_tenant across runtime/transports with matrix coverage	2026-05-28 16:17:14 +02:00
test_decision_make_audit_contract.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_decision_make_skill.py	chore: align examples with canonical capability ids	2026-05-28 10:03:33 +02:00
test_e2e_all_phases.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_execution_reliability_confidence.py	fix(runtime): enforce final confidence calibration across sync/async channels	2026-05-22 09:49:43 +02:00
test_fuzz_expressions.py	fix: resolve remaining CI failures — fuzz tests, openapi scenario, trivy SBOM, registry pin	2026-03-26 13:07:10 +01:00
test_local_capabilities.py	fix: ruff check + format across 26 files (CI lint green)	2026-03-28 16:46:40 +01:00
test_mcp_server.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_model_baselines.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
test_multipurpose_agent_capabilities.py	chore: stabilize e2e suite and migrate legacy capability aliases	2026-05-27 13:27:16 +02:00
test_neutral_api_slice2.py	runtime+gate: harden versioning, provenance checks, and readiness controls	2026-06-02 11:51:36 +02:00
test_new_skills.py	fix(lint): corregir errores ruff y aplicar formato; incluir cambios pendientes de sesiones anteriores	2026-05-14 16:19:25 +02:00
test_openapi_runtime_guardrails.py	chore: align examples with canonical capability ids	2026-05-28 10:03:33 +02:00
test_option_integrity_phase1.py	fix CI lint by applying required ruff formatting	2026-05-27 15:56:32 +02:00
test_performance_baselines.py	fix: benchmark importorskip guard + batch test soft assertion	2026-03-26 17:32:10 +01:00
test_phase_a_features.py	fix: ruff check + format across 26 files (CI lint green)	2026-03-28 16:46:40 +01:00
test_phase_bc_features.py	fix: ruff check + format across 26 files (CI lint green)	2026-03-28 16:46:40 +01:00
test_protocol_equivalence.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00
test_security_safe_summarize_skill.py	chore: stabilize e2e suite and migrate legacy capability aliases	2026-05-27 13:27:16 +02:00
test_skill_authoring.py	chore: prefer canonical capability ids in runtime references	2026-05-28 09:59:12 +02:00
test_web_source_search_baseline.py	fix(lint): corregir errores ruff y aplicar formato; incluir cambios pendientes de sesiones anteriores	2026-05-14 16:19:25 +02:00
validate_bindings.py	fix: resolve all CI failures — lint, format, bindings, tests, container-security	2026-03-26 10:45:54 +01:00

README.md

Agent Skills

Agent Skills Runtime

Agents should execute whenever possible.

Agent Skills turns repeatable agent reasoning into executable skills: reusable, testable, observable, and portable across tools and model providers.

Stop rebuilding agent logic in prompts. Define it once as a skill, bind it to any backend, and run it with full traceability.

Agent Skills Runtime is the reference implementation of ORCA (Open Cognitive Runtime Architecture):

Skills package reusable cognitive workflows
Capabilities define backend-agnostic contracts
Bindings connect contracts to execution backends (PythonCall, OpenAPI, MCP, OpenRPC)
Runtime executes DAGs with policy/safety, CognitiveState, and traceability

No API key required for local-first runs. Deterministic Python baselines are available for offline development and testing.

The problem

Most agent systems still encode critical logic inside prompts and framework glue.

That creates recurring engineering pain:

Reasoning logic gets trapped in prompt text instead of executable workflows
Workflows are hard to reuse and harder to test
Contracts between steps are implicit and brittle
Observability and auditability are often an afterthought
Safety and governance controls are inconsistent
Switching providers or frameworks usually means rewriting too much

The ORCA answer

ORCA introduces an execution layer for cognitive workflows:

Skills are reusable cognitive workflows
Capabilities are stable, contract-driven interfaces
Bindings are interchangeable execution backends
Runtime is a DAG scheduler + policy engine + cognitive state + trace

This keeps reasoning structure explicit and executable, while preserving portability across backends.

Before / after

Before (logic trapped in prompt text):

prompt = """
Analyze this PR.
Find risks.
Estimate confidence.
Suggest fixes.
Return JSON.
"""

After (logic as a reusable skill graph):

# Conceptual example (illustrative structure)
skill: code.pr.review
steps:
  - parse_diff
  - detect_risks
  - score_confidence
  - generate_review
  - validate_output

Same reasoning pattern. Reusable. Testable. Observable. Bindable to Python, OpenAPI, MCP, or your own APIs.

Try it locally in 3 minutes

git clone https://github.com/gfernandf/agent-skills.git
cd agent-skills
make bootstrap
python skills.py doctor
python skills.py run text.language-summary \
  --input '{"text": "ORCA turns agent reasoning into reusable executable skills."}'

What to expect:

No API key required
Runs offline with deterministic Python baselines
First run may take 30-60 seconds

Windows PowerShell setup and run

git clone https://github.com/gfernandf/agent-skills.git
cd agent-skills
pip install -e ".[all,dev]"
git clone https://github.com/gfernandf/agent-skill-registry.git ../agent-skill-registry
python skills.py doctor

$env:OPENAI_API_KEY = ""
'{ "text": "ORCA turns agent reasoning into reusable executable skills." }' | Set-Content input_qs.json -Encoding ascii
python skills.py run text.language-summary --input-file input_qs.json
Remove-Item input_qs.json

Why this matters beyond a toy summary

The first command verifies install. The stronger demo is the official skill decision.make.

decision.make shows a full decision workflow under uncertainty with explicit stages and auditable outputs.

From the skill contract in the registry, it includes:

Multi-step pipeline (option generation, analysis, scoring, justification, validation)
Structured outputs such as recommendation, tradeoffs, confidence_score, confidence_level, uncertainties, and next_steps
Risk-aware reasoning through explicit criteria and constraints
Trace-friendly execution aligned with ORCA observability goals

Conceptual output shape for decision-style workflows:

{
  "recommendation": "Proceed with a controlled pilot",
  "confidence_score": 0.82,
  "confidence_level": "medium",
  "tradeoffs": [
    "Faster learning, higher short-term operational overhead"
  ],
  "uncertainties": [
    "Regulatory timeline may change in Q3"
  ],
  "next_steps": [
    "Run a 6-week pilot in one segment"
  ],
  "trace_id": "..."
}

Note: the JSON above is illustrative. Exact outputs depend on input context, bindings, and policy settings.

I want to...

I want to try it

Start with local CLI: see Try it locally in 3 minutes
Use deterministic baselines for offline reproducibility
Explore first workflows in examples and docs

I want to integrate it

Choose one integration surface:

Embedded SDK (lowest latency, in-process)
HTTP API (service boundary, non-Python clients)
MCP server (tooling ecosystems and MCP hosts)
Framework adapters (LangChain, CrewAI, AutoGen, Semantic Kernel)
Native tool definitions (Anthropic, OpenAI, Gemini)

I want to build skills

Author declarative skills as DAG workflows
Reuse existing capability contracts
Validate wiring and execution behavior
Package and contribute reusable workflows

Mental model

Think of Agent Skills as:

Capabilities: what an operation means (contract)
Bindings: how that operation is executed (backend)
Skills: how operations compose into workflows (DAG)
Runtime: how workflows execute safely and observably

Cognitive Taxonomy

The pure cognitive layer is intentionally narrower than the full runtime. The current taxonomy separates:

Pure cognitive capabilities: decision, evaluation, evidence, memory, perception, and reasoning.
Compatibility surfaces: legacy or transitional names such as eval.* that remain in the live registry during migration.
Operational capabilities: routing, delegation, workflow control, and other runtime helpers that should not be counted as core cognition.

The registry-level reference for that taxonomy is:

agent-skill-registry/docs/COGNITIVE_TAXONOMY.md

Use that document as the source of truth when deciding whether a capability belongs to the cognitive core or to the operational layer.

Core concepts

Skills

Reusable cognitive workflows declared as DAGs.

Capabilities

Backend-agnostic contracts with typed inputs and outputs.

Bindings

Execution adapters for PythonCall, OpenAPI, MCP, and OpenRPC.

Runtime

Execution layer with DAG scheduling, policy gates, cognitive state, and trace.

Integration modes

Mode	Best for	Requires server?
Embedded SDK	Python apps and notebooks	No
Native tool defs	Direct model SDK integration	No
Framework adapters	Existing agent frameworks	No
MCP server	MCP-compatible hosts	MCP host
HTTP API	Service-oriented architectures	Yes

Embedded SDK (example)

from sdk.embedded import as_langchain_tools

tools = as_langchain_tools(["text.content.summarize", "text.content.translate"])

HTTP API (example)

agent-skills serve
curl http://localhost:8080/v1/health
curl -X POST http://localhost:8080/v1/skills/text.language-summary/execute \
  -H "Content-Type: application/json" \
  -d '{"inputs": {"text": "Hello world from ORCA"}}'

MCP server (example)

python -m official_mcp_servers
python -m official_mcp_servers --sse --host 0.0.0.0 --port 8765

Native tool definitions (example)

from sdk.embedded import as_openai_tools, execute_openai_tool_call

tools = as_openai_tools()
# pass tools to your OpenAI client, then dispatch tool calls via execute_openai_tool_call

Architecture

graph TB
    subgraph Interface
        CLI[CLI]
        HTTP[HTTP API]
        SDK[Embedded SDK / Adapters]
        MCP[MCP Server]
    end

    subgraph Runtime
        GW[Gateway]
        SCH[DAG Scheduler]
        POL[Policy and Safety]
        COG[CognitiveState]
        TRC[Trace and Audit]
    end

    subgraph BindingLayer
        RES[Binding Resolver]
        PY[PythonCall]
        OA[OpenAPI]
        MP[MCP]
        RP[OpenRPC]
    end

    subgraph Backends
        BASE[Deterministic Python baselines]
        EXT[External APIs and services]
        MCPB[MCP backends]
    end

    CLI --> GW
    HTTP --> GW
    SDK --> GW
    MCP --> GW

    GW --> SCH
    SCH --> POL
    SCH --> COG
    SCH --> TRC
    POL --> RES

    RES --> PY --> BASE
    RES --> OA --> EXT
    RES --> MP --> MCPB
    RES --> RP --> EXT

How it compares

Agent Skills is not a replacement for every agent framework.

It can run standalone, but its strongest use case is as a reusable execution layer underneath frameworks, tools, and model providers.

Dimension	Agent Skills	Typical agent framework
Execution model	Declarative DAG skills	Often prompt/tool loop centered
Contracts	Capability-first, typed	Usually app-level conventions
Backend portability	Binding abstraction layer	Often provider/framework specific
Safety/governance	Policy gates and execution controls	Varies widely
Observability	Trace and audit oriented	Varies widely
Local deterministic mode	Yes, baseline-first workflow	Often key-dependent

Advanced features

Auth and RBAC controls
Webhook eventing
Plugin extension points
Audit modes and runtime observability
CognitiveState v1 and cognitive hints
Runtime-managed output envelope (status, rationale, trace_ref)
JSON Schema generation and validation
Skill governance and conformance tooling

Cognitive quality gates (>9)

The runtime includes a quality gate bundle for pure cognitive capabilities.

Run the gate pack:

python tooling/run_cognitive_quality_gates.py \
  --report-file artifacts/cognitive_quality_gates_local_report.json

Generate scorecard only:

python tooling/generate_cognitive_quality_scorecard.py \
  --fail-on-threshold \
  --min-axis 9.0 \
  --min-overall 9.0

Primary artifacts:

artifacts/cognitive_e2e_contract_report.json
artifacts/cognitive_semantic_all_report.json
artifacts/cognitive_quality_scorecard.json
artifacts/cognitive_quality_gates_local_report.json

See docs index below for details.

Documentation

Topic	Link
10-minute onboarding	docs/ONBOARDING_10_MIN.md
Target architecture (canonical)	docs/TARGET_ARCHITECTURE.md
Installation	docs/INSTALLATION.md
Environment variables	docs/ENVIRONMENT_VARIABLES.md
Error taxonomy	docs/ERROR_TAXONOMY.md
Runner architecture	docs/RUNNER_GUIDE.md
Binding selection policy	docs/BINDING_SELECTION.md
Binding authoring guide	docs/BINDING_GUIDE.md
DAG scheduler	docs/SCHEDULER.md
Step control flow	docs/STEP_CONTROL_FLOW.md
Streaming	docs/STREAMING.md
Async execution	docs/ASYNC_EXECUTION.md
Deployment	docs/DEPLOYMENT.md
Observability	docs/OBSERVABILITY.md
Auth and RBAC	docs/AUTH.md
Webhooks	docs/WEBHOOKS.md
Plugins	docs/PLUGINS.md
JSON schemas	docs/JSON_SCHEMAS.md
Skill authoring	docs/SKILL_AUTHORING.md
Troubleshooting	docs/TROUBLESHOOTING.md
Public release use cases	docs/PUBLIC_RELEASE_USE_CASES.md
Project status	docs/PROJECT_STATUS.md
ORCA specification	ORCA.md

Serve docs locally:

make serve

Research paper

Beyond Prompting: Decoupling Cognition from Execution in LLM-based Agents through the ORCA Framework

Fernandez Alvarez, G. E. (2026)

DOI: https://doi.org/10.5281/zenodo.19438943
SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6600840
Paper page: docs/PAPER.md

Contributing

Contributions are welcome. See CONTRIBUTING.md.

make check

License

Apache 2.0. See LICENSE.

Citing

If you use Agent Skills or ORCA in research, please cite:

@article{fernandez_orca_2026,
  author    = {Fernandez Alvarez, Guillermo E.},
  title     = {Beyond Prompting: Decoupling Cognition from Execution in
               LLM-based Agents through the ORCA Framework},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19438943},
  url       = {https://doi.org/10.5281/zenodo.19438943}
}

Software citation:

@software{fernandez_agent_skills_2026,
  author       = {Fernandez Alvarez, Guillermo},
  title        = {Agent Skills Runtime},
  year         = {2026},
  url          = {https://github.com/gfernandf/agent-skills},
  version      = {1.0.2},
  license      = {Apache-2.0}
}

Troubleshooting

Problem	Solution
Registry not found	Run doctor and ensure agent-skill-registry is cloned next to this repo
Command not found on Windows	Use python skills.py ... from repo root
Unexpected runtime error	Check docs/ERROR_TAXONOMY.md
Environment mismatch	Review docs/ENVIRONMENT_VARIABLES.md