{"kind":"AgentDefinition","metadata":{"namespace":"community","name":"dataverse-python-performance-optimization","version":"0.1.0"},"spec":{"agents_md":"---\napplyTo: '**'\n---\n\n# Dataverse SDK for Python — Performance \u0026 Optimization Guide\n\nBased on official Microsoft Dataverse and Azure SDK performance guidance.\n\n## 1. Performance Overview\n\nThe Dataverse SDK for Python is optimized for Python developers but has some limitations in preview:\n- **Minimal retry policy**: Only network errors are retried by default\n- **No DeleteMultiple**: Use individual deletes or update status instead\n- **Limited OData batching**: General-purpose OData batching not supported\n- **SQL limitations**: No JOINs, limited WHERE/TOP/ORDER BY\n\nWorkarounds and optimization strategies address these limitations.\n\n---\n\n## 2. Query Optimization\n\n### Use Select to Limit Columns\n\n```python\n# ❌ SLOW - Retrieves all columns\naccounts = client.get(\"account\", top=100)\n\n# ✅ FAST - Only retrieve needed columns\naccounts = client.get(\n    \"account\",\n    select=[\"accountid\", \"name\", \"telephone1\", \"creditlimit\"],\n    top=100\n)\n```\n\n**Impact**: Reduces payload size and memory usage by 30-50%.\n\n---\n\n### Use Filters Efficiently\n\n```python\n# ❌ SLOW - Fetch all, filter in Python\nall_accounts = client.get(\"account\")\nactive_accounts = [a for a in all_accounts if a.get(\"statecode\") == 0]\n\n# ✅ FAST - Filter server-side\naccounts = client.get(\n    \"account\",\n    filter=\"statecode eq 0\",\n    top=100\n)\n```\n\n**OData filter examples**:\n```python\n# Equals\nfilter=\"statecode eq 0\"\n\n# String contains\nfilter=\"contains(name, 'Acme')\"\n\n# Multiple conditions\nfilter=\"statecode eq 0 and createdon gt 2025-01-01Z\"\n\n# Not equals\nfilter=\"statecode ne 2\"\n```\n\n---\n\n### Order by for Predictable Paging\n\n```python\n# Ensure consistent order for pagination\naccounts = client.get(\n    \"account\",\n    orderby=[\"createdon desc\", \"name asc\"],\n    page_size=100\n)\n\nfor page in accounts:\n    process_page(page)\n```\n\n---\n\n## 3. Pagination Best Practices\n\n### Lazy Pagination (Recommended)\n\n```python\n# ✅ BEST - Generator yields one page at a time\npages = client.get(\n    \"account\",\n    top=5000,              # Total limit\n    page_size=200          # Per-page size (hint)\n)\n\nfor page in pages:  # Each iteration fetches one page\n    for record in page:\n        process_record(record)  # Process immediately\n```\n\n**Benefits**:\n- Memory efficient (pages loaded on-demand)\n- Fast time-to-first-result\n- Can stop early if needed\n\n### Avoid Loading Everything into Memory\n\n```python\n# ❌ SLOW - Loads all 100,000 records at once\nall_records = list(client.get(\"account\", top=100000))\nprocess(all_records)\n\n# ✅ FAST - Process as you go\nfor page in client.get(\"account\", top=100000, page_size=5000):\n    process(page)\n```\n\n---\n\n## 4. Batch Operations\n\n### Bulk Create (Recommended)\n\n```python\n# ✅ BEST - Single call with multiple records\npayloads = [\n    {\"name\": f\"Account {i}\", \"telephone1\": f\"555-{i:04d}\"}\n    for i in range(1000)\n]\nids = client.create(\"account\", payloads)  # One API call for many records\n```\n\n### Bulk Update - Broadcast Mode\n\n```python\n# ✅ FAST - Same update applied to many records\naccount_ids = [\"id1\", \"id2\", \"id3\", \"...\"]\nclient.update(\"account\", account_ids, {\"statecode\": 1})  # One call\n```\n\n### Bulk Update - Per-Record Mode\n\n```python\n# ✅ ACCEPTABLE - Different updates for each record\naccount_ids = [\"id1\", \"id2\", \"id3\"]\nupdates = [\n    {\"telephone1\": \"555-0100\"},\n    {\"telephone1\": \"555-0200\"},\n    {\"telephone1\": \"555-0300\"},\n]\nclient.update(\"account\", account_ids, updates)\n```\n\n### Batch Size Tuning\n\nBased on table complexity (per Microsoft guidance):\n\n| Table Type | Batch Size | Max Threads |\n|------------|-----------|-------------|\n| OOB (Account, Contact, Lead) | 200-300 | 30 |\n| Simple (few lookups) | ≤10 | 50 |\n| Moderately complex | ≤100 | 30 |\n| Large/complex (\u003e100 cols, \u003e20 lookups) | 10-20 | 10-20 |\n\n```python\ndef bulk_create_optimized(client, table_name, payloads, batch_size=200):\n    \"\"\"Create records in optimal batch size.\"\"\"\n    for i in range(0, len(payloads), batch_size):\n        batch = payloads[i:i + batch_size]\n        ids = client.create(table_name, batch)\n        print(f\"Created {len(ids)} records\")\n        yield ids\n```\n\n---\n\n## 5. Connection Management\n\n### Reuse Client Instance\n\n```python\n# ❌ BAD - Creates new connection each time\ndef process_batch():\n    for batch in batches:\n        client = DataverseClient(...)  # Expensive!\n        client.create(\"account\", batch)\n\n# ✅ GOOD - Reuse connection\nclient = DataverseClient(...)  # Create once\n\ndef process_batch():\n    for batch in batches:\n        client.create(\"account\", batch)  # Reuse\n```\n\n### Global Client Instance\n\n```python\n# singleton_client.py\nfrom azure.identity import DefaultAzureCredential\nfrom PowerPlatform.Dataverse.client import DataverseClient\n\n_client = None\n\ndef get_client():\n    global _client\n    if _client is None:\n        _client = DataverseClient(\n            base_url=\"https://myorg.crm.dynamics.com\",\n            credential=DefaultAzureCredential()\n        )\n    return _client\n\n# main.py\nfrom singleton_client import get_client\n\nclient = get_client()\nrecords = client.get(\"account\")\n```\n\n### Connection Timeout Configuration\n\n```python\nfrom PowerPlatform.Dataverse.core.config import DataverseConfig\n\ncfg = DataverseConfig()\ncfg.http_timeout = 30         # Request timeout\ncfg.connection_timeout = 5    # Connection timeout\n\nclient = DataverseClient(\n    base_url=\"https://myorg.crm.dynamics.com\",\n    credential=credential,\n    config=cfg\n)\n```\n\n---\n\n## 6. Async Operations (Future Capability)\n\nCurrently synchronous, but prepare for async:\n\n```python\n# Recommended pattern for future async support\nimport asyncio\n\nasync def get_accounts_async(client):\n    \"\"\"Pattern for future async SDK.\"\"\"\n    # When SDK supports async:\n    # accounts = await client.get(\"account\")\n    # For now, use sync with executor\n    loop = asyncio.get_event_loop()\n    accounts = await loop.run_in_executor(\n        None, \n        lambda: list(client.get(\"account\"))\n    )\n    return accounts\n\n# Usage\naccounts = asyncio.run(get_accounts_async(client))\n```\n\n---\n\n## 7. File Upload Optimization\n\n### Small Files (\u003c128 MB)\n\n```python\n# ✅ FAST - Single request\nclient.upload_file(\n    table_name=\"account\",\n    record_id=record_id,\n    column_name=\"document_column\",\n    file_path=\"small_file.pdf\"\n)\n```\n\n### Large Files (\u003e128 MB)\n\n```python\n# ✅ OPTIMIZED - Chunked upload\nclient.upload_file(\n    table_name=\"account\",\n    record_id=record_id,\n    column_name=\"document_column\",\n    file_path=\"large_file.pdf\",\n    mode='chunk',\n    if_none_match=True\n)\n\n# SDK automatically:\n# 1. Splits file into 4MB chunks\n# 2. Uploads chunks in parallel\n# 3. Assembles on server\n```\n\n---\n\n## 8. OData Query Optimization\n\n### SQL Alternative (Simple Queries)\n\n```python\n# ✅ SOMETIMES FASTER - Direct SQL for SELECT only\n# Limited support: single SELECT, optional WHERE/TOP/ORDER BY\nrecords = client.get(\n    \"account\",\n    sql=\"SELECT accountid, name FROM account WHERE statecode = 0 ORDER BY name\"\n)\n```\n\n### Complex Queries\n\n```python\n# ❌ NOT SUPPORTED - JOINs, complex WHERE\nsql=\"SELECT a.accountid, c.fullname FROM account a JOIN contact c ON a.accountid = c.parentcustomerid\"\n\n# ✅ WORKAROUND - Get accounts, then contacts for each\naccounts = client.get(\"account\", select=[\"accountid\", \"name\"])\nfor account in accounts:\n    contacts = client.get(\n        \"contact\",\n        filter=f\"parentcustomerid eq '{account['accountid']}'\"\n    )\n    process(account, contacts)\n```\n\n---\n\n## 9. Memory Management\n\n### Process Large Datasets Incrementally\n\n```python\nimport gc\n\ndef process_large_table(client, table_name):\n    \"\"\"Process millions of records without memory issues.\"\"\"\n    \n    for page in client.get(table_name, page_size=5000):\n        for record in page:\n            result = process_record(record)\n            save_result(result)\n        \n        # Force garbage collection between pages\n        gc.collect()\n```\n\n### DataFrame Integration with Chunking\n\n```python\nimport pandas as pd\n\ndef load_to_dataframe_chunked(client, table_name, chunk_size=10000):\n    \"\"\"Load data to DataFrame in chunks.\"\"\"\n    \n    dfs = []\n    for page in client.get(table_name, page_size=1000):\n        df_chunk = pd.DataFrame(page)\n        dfs.append(df_chunk)\n        \n        # Combine when chunk threshold reached\n        if len(dfs) \u003e= chunk_size // 1000:\n            df = pd.concat(dfs, ignore_index=True)\n            process_chunk(df)\n            dfs = []\n    \n    # Process remaining\n    if dfs:\n        df = pd.concat(dfs, ignore_index=True)\n        process_chunk(df)\n```\n\n---\n\n## 10. Rate Limiting Handling\n\nSDK has minimal retry support - implement manually:\n\n```python\nimport time\nfrom PowerPlatform.Dataverse.core.errors import DataverseError\n\ndef call_with_backoff(func, max_retries=3):\n    \"\"\"Call function with exponential backoff for rate limits.\"\"\"\n    \n    for attempt in range(max_retries):\n        try:\n            return func()\n        except DataverseError as e:\n            if e.status_code == 429:  # Too Many Requests\n                if attempt \u003c max_retries - 1:\n                    wait_time = 2 ** attempt  # 1s, 2s, 4s\n                    print(f\"Rate limited. Waiting {wait_time}s...\")\n                    time.sleep(wait_time)\n                else:\n                    raise\n            else:\n                raise\n\n# Usage\nids = call_with_backoff(\n    lambda: client.create(\"account\", payload)\n)\n```\n\n---\n\n## 11. Transaction Consistency (Known Limitation)\n\nSDK doesn't have transactional guarantees:\n\n```python\n# ⚠️ If bulk operation partially fails, some records may be created\n\ndef create_with_consistency_check(client, table_name, payloads):\n    \"\"\"Create records and verify all succeeded.\"\"\"\n    \n    try:\n        ids = client.create(table_name, payloads)\n        \n        # Verify all records created\n        created = client.get(\n            table_name,\n            filter=f\"isof(Microsoft.Dynamics.CRM.{table_name})\"\n        )\n        \n        if len(ids) != count_created:\n            print(f\"⚠️ Only {count_created}/{len(ids)} records created\")\n            # Handle partial failure\n    except Exception as e:\n        print(f\"Creation failed: {e}\")\n        # Check what was created\n```\n\n---\n\n## 12. Monitoring Performance\n\n### Log Operation Duration\n\n```python\nimport time\nimport logging\n\nlogger = logging.getLogger(\"dataverse\")\n\ndef monitored_operation(operation_name):\n    \"\"\"Decorator to monitor operation performance.\"\"\"\n    def decorator(func):\n        def wrapper(*args, **kwargs):\n            start = time.time()\n            try:\n                result = func(*args, **kwargs)\n                duration = time.time() - start\n                logger.info(f\"{operation_name}: {duration:.2f}s\")\n                return result\n            except Exception as e:\n                duration = time.time() - start\n                logger.error(f\"{operation_name} failed after {duration:.2f}s: {e}\")\n                raise\n        return wrapper\n    return decorator\n\n@monitored_operation(\"Bulk Create Accounts\")\ndef create_accounts(client, payloads):\n    return client.create(\"account\", payloads)\n```\n\n---\n\n## 13. Performance Checklist\n\n| Item | Status | Notes |\n|------|--------|-------|\n| Reuse client instance | ☐ | Create once, reuse |\n| Use select to limit columns | ☐ | Only retrieve needed data |\n| Filter server-side with OData | ☐ | Don't fetch all and filter |\n| Use pagination with page_size | ☐ | Process incrementally |\n| Batch operations | ☐ | Use create/update for multiple |\n| Tune batch size by table type | ☐ | OOB=200-300, Simple=≤10 |\n| Handle rate limiting (429) | ☐ | Implement exponential backoff |\n| Use chunked upload for large files | ☐ | SDK handles for \u003e128MB |\n| Monitor operation duration | ☐ | Log timing for analysis |\n| Test with production-like data | ☐ | Performance varies with data volume |\n\n---\n\n## 14. See Also\n\n- [Dataverse Web API Performance](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/optimize-performance-create-update)\n- [OData Query Options](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/webapi/query-data-web-api)\n- [SDK Working with Data](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/sdk-python/work-data)\n","description":"Based on official Microsoft Dataverse and Azure SDK performance guidance.","import":{"commit_sha":"541b7819d8c3545c6df122491af4fa1eae415779","imported_at":"2026-05-18T20:05:35Z","license_text":"MIT License\n\nCopyright GitHub, Inc.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.","owner":"github","repo":"github/awesome-copilot","source_url":"https://github.com/github/awesome-copilot/blob/541b7819d8c3545c6df122491af4fa1eae415779/instructions/dataverse-python-performance-optimization.instructions.md"},"manifest":{}},"content_hash":[247,207,254,180,201,129,132,203,120,188,55,8,41,61,162,227,69,170,92,250,76,72,41,158,241,203,164,138,230,228,159,196],"trust_level":"unsigned","yanked":false}
