← Back to Architecture

Pipeline organization

Documentation for pipeline_organization from the Pipeline ex repository.

Pipeline Organization and Categorization System

Overview

This document defines the organizational structure for the AI engineering pipeline library, establishing a systematic approach to pipeline discovery, reuse, and composition.

Directory Structure

pipeline_ex/
├── pipelines/                      # Main pipeline library
│   ├── registry.yaml              # Global pipeline registry
│   ├── data/                      # Data processing pipelines
│   │   ├── cleaning/
│   │   ├── enrichment/
│   │   ├── transformation/
│   │   └── quality/
│   ├── model/                     # Model development pipelines
│   │   ├── prompt_engineering/
│   │   ├── evaluation/
│   │   ├── comparison/
│   │   └── fine_tuning/
│   ├── code/                      # Code generation pipelines
│   │   ├── api_generation/
│   │   ├── test_generation/
│   │   ├── documentation/
│   │   └── refactoring/
│   ├── analysis/                  # Analysis pipelines
│   │   ├── codebase/
│   │   ├── security/
│   │   ├── performance/
│   │   └── dependencies/
│   ├── content/                   # Content generation pipelines
│   │   ├── blog/
│   │   ├── tutorial/
│   │   ├── api_docs/
│   │   └── changelog/
│   ├── devops/                    # DevOps pipelines
│   │   ├── ci_cd/
│   │   ├── deployment/
│   │   ├── monitoring/
│   │   └── infrastructure/
│   ├── components/                # Reusable components
│   │   ├── steps/                # Reusable step definitions
│   │   ├── prompts/              # Prompt templates
│   │   ├── functions/            # Gemini function definitions
│   │   ├── validators/           # Validation components
│   │   └── transformers/         # Data transformation components
│   └── templates/                 # Pipeline templates
│       ├── basic/                # Simple pipeline patterns
│       ├── advanced/             # Complex pipeline patterns
│       └── enterprise/           # Production-grade patterns
├── examples/                      # Example usage and demos
│   ├── tutorials/                # Step-by-step tutorials
│   └── case_studies/             # Real-world implementations
└── tests/                        # Pipeline-specific tests
    ├── pipeline_tests/           # Integration tests for pipelines
    └── component_tests/          # Unit tests for components

Pipeline Registry Schema

The registry.yaml serves as the central catalog of all available pipelines:

version: "1.0"
last_updated: "2025-06-30"

pipelines:
  - id: "data-cleaning-standard"
    name: "Standard Data Cleaning Pipeline"
    category: "data/cleaning"
    description: "Multi-stage data cleaning with validation"
    version: "1.0.0"
    tags: ["data", "cleaning", "validation"]
    dependencies:
      - "components/steps/validation"
      - "components/transformers/data"
    complexity: "medium"
    estimated_tokens: 5000
    providers: ["claude", "gemini"]
    
  - id: "api-rest-generator"
    name: "REST API Generator"
    category: "code/api_generation"
    description: "Generate complete REST API with tests"
    version: "2.1.0"
    tags: ["api", "code-generation", "rest"]
    dependencies:
      - "components/steps/code"
      - "components/prompts/api"
    complexity: "high"
    estimated_tokens: 15000
    providers: ["claude"]

Categorization Taxonomy

1. Primary Categories

  • Data: Pipelines focused on data manipulation and processing
  • Model: AI/ML model development and optimization
  • Code: Software development and code generation
  • Analysis: System and code analysis workflows
  • Content: Documentation and content creation
  • DevOps: Infrastructure and deployment automation

2. Complexity Levels

  • Basic: Single-step or simple multi-step pipelines
  • Medium: Multi-step with conditional logic
  • High: Complex workflows with parallel execution
  • Enterprise: Production-grade with full error handling

3. Provider Requirements

  • Claude-only: Requires Claude-specific features
  • Gemini-only: Requires Gemini function calling
  • Multi-provider: Can use either provider
  • Hybrid: Requires both providers

Component Classification

Step Components

# components/steps/validation/input_validator.yaml
component:
  type: "step"
  id: "input-validator"
  name: "Input Validation Step"
  description: "Validates input data against schema"
  
  parameters:
    schema:
      type: "object"
      description: "JSON Schema for validation"
    strict:
      type: "boolean"
      default: true
      
  outputs:
    valid:
      type: "boolean"
    errors:
      type: "array"
      items:
        type: "string"

Prompt Templates

# components/prompts/analysis/code_review.yaml
component:
  type: "prompt"
  id: "code-review-prompt"
  name: "Code Review Prompt Template"
  
  variables:
    - code_content
    - review_focus
    - severity_level
    
  template: |
    Review the following code with focus on {review_focus}:
    
    ```
    {code_content}
    ```
    
    Provide feedback at {severity_level} level.

Naming Conventions

Pipeline Files

  • Format: {purpose}_{variant}_pipeline.yaml
  • Examples:
    • data_cleaning_standard_pipeline.yaml
    • api_generation_rest_pipeline.yaml
    • security_audit_comprehensive_pipeline.yaml

Component Files

  • Format: {function}_{type}.yaml
  • Examples:
    • input_validator.yaml
    • json_transformer.yaml
    • code_review_prompt.yaml

Version Tags

  • Semantic versioning: MAJOR.MINOR.PATCH
  • Beta versions: X.Y.Z-beta.N
  • Release candidates: X.Y.Z-rc.N

Discovery Mechanisms

1. CLI Commands

# List all pipelines
mix pipeline.list

# Search by category
mix pipeline.list --category data/cleaning

# Search by tags
mix pipeline.list --tags "api,rest"

# Show pipeline details
mix pipeline.info api-rest-generator

2. Web Interface (Future)

  • Visual pipeline browser
  • Dependency graph visualization
  • Performance metrics dashboard
  • Usage analytics

3. API Access

# Pipeline discovery API
Pipeline.Registry.list_by_category("data/cleaning")
Pipeline.Registry.search(tags: ["api", "rest"])
Pipeline.Registry.get_details("api-rest-generator")

Metadata Standards

Each pipeline must include:

  1. Unique identifier
  2. Descriptive name
  3. Clear category placement
  4. Version information
  5. Dependency declarations
  6. Performance estimates
  7. Provider requirements
  8. Comprehensive tags

Migration Path

For existing pipelines:

  1. Analyze current pipeline files
  2. Categorize according to new taxonomy
  3. Add required metadata
  4. Update file locations
  5. Register in central registry
  6. Update references in code

Governance

Adding New Pipelines

  1. Define clear purpose and category
  2. Follow naming conventions
  3. Include all required metadata
  4. Add comprehensive tests
  5. Document usage examples
  6. Submit for review

Deprecation Process

  1. Mark as deprecated in registry
  2. Add deprecation notice to file
  3. Provide migration guide
  4. Maintain for 2 major versions
  5. Archive after removal

Benefits

  1. Discoverability: Easy to find relevant pipelines
  2. Reusability: Clear component boundaries
  3. Maintainability: Organized structure
  4. Scalability: Supports growth
  5. Consistency: Enforced standards
  6. Quality: Review process