{"kind":"AgentDefinition","metadata":{"namespace":"community","name":"dataverse-python-pandas-integration","version":"0.1.0"},"spec":{"agents_md":"# Dataverse SDK for Python - Pandas Integration Guide\n\n## Overview\nGuide to integrating the Dataverse SDK for Python with pandas DataFrames for data science and analysis workflows. The SDK's JSON response format maps seamlessly to pandas DataFrames, enabling data scientists to work with Dataverse data using familiar data manipulation tools.\n\n---\n\n## 1. Introduction to PandasODataClient\n\n### What is PandasODataClient?\n`PandasODataClient` is a thin wrapper around the standard `DataverseClient` that returns data in pandas DataFrame format instead of raw JSON dictionaries. This makes it ideal for:\n- Data scientists working with tabular data\n- Analytics and reporting workflows\n- Data exploration and cleaning\n- Integration with machine learning pipelines\n\n### Installation Requirements\n```bash\n# Install core dependencies\npip install PowerPlatform-Dataverse-Client\npip install azure-identity\n\n# Install pandas for data manipulation\npip install pandas\n```\n\n### When to Use PandasODataClient\n✅ **Use when you need:**\n- Data exploration and analysis\n- Working with tabular data\n- Integration with statistical/ML libraries\n- Efficient data manipulation\n\n❌ **Use DataverseClient instead when you need:**\n- Real-time CRUD operations only\n- File upload operations\n- Metadata operations\n- Single record operations\n\n---\n\n## 2. Basic DataFrame Workflow\n\n### Converting Query Results to DataFrame\n```python\nfrom azure.identity import InteractiveBrowserCredential\nfrom PowerPlatform.Dataverse.client import DataverseClient\nimport pandas as pd\n\n# Setup authentication\nbase_url = \"https://\u003cmyorg\u003e.crm.dynamics.com\"\ncredential = InteractiveBrowserCredential()\nclient = DataverseClient(base_url=base_url, credential=credential)\n\n# Query data\npages = client.get(\n    \"account\",\n    select=[\"accountid\", \"name\", \"creditlimit\", \"telephone1\"],\n    filter=\"statecode eq 0\",\n    orderby=[\"name\"]\n)\n\n# Collect all pages into one DataFrame\nall_records = []\nfor page in pages:\n    all_records.extend(page)\n\n# Convert to DataFrame\ndf = pd.DataFrame(all_records)\n\n# Display first few rows\nprint(df.head())\nprint(f\"Total records: {len(df)}\")\n```\n\n### Query Parameters Map to DataFrame\n```python\n# All query parameters return as columns in DataFrame\ndf = pd.DataFrame(\n    client.get(\n        \"account\",\n        select=[\"accountid\", \"name\", \"creditlimit\", \"telephone1\", \"createdon\"],\n        filter=\"creditlimit \u003e 50000\",\n        orderby=[\"creditlimit desc\"]\n    )\n)\n\n# Result is a DataFrame with columns:\n# accountid | name | creditlimit | telephone1 | createdon\n```\n\n---\n\n## 3. Data Exploration with Pandas\n\n### Basic Exploration\n```python\nimport pandas as pd\nfrom azure.identity import InteractiveBrowserCredential\nfrom PowerPlatform.Dataverse.client import DataverseClient\n\nclient = DataverseClient(\"https://\u003cmyorg\u003e.crm.dynamics.com\", InteractiveBrowserCredential())\n\n# Load account data\nrecords = []\nfor page in client.get(\"account\", select=[\"accountid\", \"name\", \"creditlimit\", \"industrycode\"]):\n    records.extend(page)\n\ndf = pd.DataFrame(records)\n\n# Explore the data\nprint(df.shape)           # (1000, 4)\nprint(df.dtypes)          # Data types\nprint(df.describe())      # Statistical summary\nprint(df.info())          # Column info and null counts\nprint(df.head(10))        # First 10 rows\n```\n\n### Filtering and Selecting\n```python\n# Filter rows by condition\nhigh_value = df[df['creditlimit'] \u003e 100000]\n\n# Select specific columns\nnames_limits = df[['name', 'creditlimit']]\n\n# Multiple conditions\nfiltered = df[(df['creditlimit'] \u003e 50000) \u0026 (df['industrycode'] == 1)]\n\n# Value counts\nprint(df['industrycode'].value_counts())\n```\n\n### Sorting and Grouping\n```python\n# Sort by column\nsorted_df = df.sort_values('creditlimit', ascending=False)\n\n# Group by and aggregate\nby_industry = df.groupby('industrycode').agg({\n    'creditlimit': ['mean', 'sum', 'count'],\n    'name': 'count'\n})\n\n# Group statistics\nprint(df.groupby('industrycode')['creditlimit'].describe())\n```\n\n### Data Cleaning\n```python\n# Handle missing values\ndf_clean = df.dropna()                    # Remove rows with NaN\ndf_filled = df.fillna(0)                  # Fill NaN with 0\ndf_ffill = df.fillna(method='ffill')      # Forward fill\n\n# Check for duplicates\nduplicates = df[df.duplicated(['name'])]\ndf_unique = df.drop_duplicates()\n\n# Data type conversion\ndf['creditlimit'] = pd.to_numeric(df['creditlimit'])\ndf['createdon'] = pd.to_datetime(df['createdon'])\n```\n\n---\n\n## 4. Data Analysis Patterns\n\n### Aggregation and Summarization\n```python\n# Create summary report\nsummary = df.groupby('industrycode').agg({\n    'accountid': 'count',\n    'creditlimit': ['mean', 'min', 'max', 'sum'],\n    'name': lambda x: ', '.join(x.head(3))  # Sample names\n}).round(2)\n\nprint(summary)\n```\n\n### Time-Series Analysis\n```python\n# Convert to datetime\ndf['createdon'] = pd.to_datetime(df['createdon'])\n\n# Resample to monthly\nmonthly = df.set_index('createdon').resample('M').size()\n\n# Extract date components\ndf['year'] = df['createdon'].dt.year\ndf['month'] = df['createdon'].dt.month\ndf['day_of_week'] = df['createdon'].dt.day_name()\n```\n\n### Join and Merge Operations\n```python\n# Load two related tables\naccounts = pd.DataFrame(client.get(\"account\", select=[\"accountid\", \"name\"]))\ncontacts = pd.DataFrame(client.get(\"contact\", select=[\"contactid\", \"parentcustomerid\", \"fullname\"]))\n\n# Merge on relationship\nmerged = accounts.merge(\n    contacts,\n    left_on='accountid',\n    right_on='parentcustomerid',\n    how='left'\n)\n\nprint(merged.head())\n```\n\n### Statistical Analysis\n```python\n# Correlation matrix\ncorrelation = df[['creditlimit', 'industrycode']].corr()\n\n# Distribution analysis\nprint(df['creditlimit'].describe())\nprint(df['creditlimit'].skew())\nprint(df['creditlimit'].kurtosis())\n\n# Percentiles\nprint(df['creditlimit'].quantile([0.25, 0.5, 0.75]))\n```\n\n---\n\n## 5. Pivot Tables and Reports\n\n### Creating Pivot Tables\n```python\n# Pivot table by industry and status\npivot = pd.pivot_table(\n    df,\n    values='creditlimit',\n    index='industrycode',\n    columns='statecode',\n    aggfunc=['sum', 'mean', 'count']\n)\n\nprint(pivot)\n```\n\n### Generating Reports\n```python\n# Sales report by industry\nindustry_report = df.groupby('industrycode').agg({\n    'accountid': 'count',\n    'creditlimit': 'sum',\n    'name': 'first'\n}).rename(columns={\n    'accountid': 'Account Count',\n    'creditlimit': 'Total Credit Limit',\n    'name': 'Sample Account'\n})\n\n# Export to CSV\nindustry_report.to_csv('industry_report.csv')\n\n# Export to Excel\nindustry_report.to_excel('industry_report.xlsx')\n```\n\n---\n\n## 6. Data Visualization\n\n### Matplotlib Integration\n```python\nimport matplotlib.pyplot as plt\n\n# Create visualizations\nfig, axes = plt.subplots(2, 2, figsize=(12, 10))\n\n# Histogram\ndf['creditlimit'].hist(bins=30, ax=axes[0, 0])\naxes[0, 0].set_title('Credit Limit Distribution')\n\n# Bar chart\ndf['industrycode'].value_counts().plot(kind='bar', ax=axes[0, 1])\naxes[0, 1].set_title('Accounts by Industry')\n\n# Box plot\ndf.boxplot(column='creditlimit', by='industrycode', ax=axes[1, 0])\naxes[1, 0].set_title('Credit Limit by Industry')\n\n# Scatter plot\ndf.plot.scatter(x='creditlimit', y='industrycode', ax=axes[1, 1])\naxes[1, 1].set_title('Credit Limit vs Industry')\n\nplt.tight_layout()\nplt.show()\n```\n\n### Seaborn Integration\n```python\nimport seaborn as sns\n\n# Correlation heatmap\nplt.figure(figsize=(8, 6))\nsns.heatmap(df[['creditlimit', 'industrycode']].corr(), annot=True)\nplt.title('Correlation Matrix')\nplt.show()\n\n# Distribution plot\nsns.distplot(df['creditlimit'], kde=True)\nplt.title('Credit Limit Distribution')\nplt.show()\n```\n\n---\n\n## 7. Machine Learning Integration\n\n### Preparing Data for ML\n```python\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import train_test_split\n\n# Load and prepare data\nrecords = []\nfor page in client.get(\"account\", select=[\"accountid\", \"creditlimit\", \"industrycode\", \"statecode\"]):\n    records.extend(page)\n\ndf = pd.DataFrame(records)\n\n# Feature engineering\ndf['log_creditlimit'] = np.log1p(df['creditlimit'])\ndf['industry_cat'] = pd.Categorical(df['industrycode']).codes\n\n# Split features and target\nX = df[['industrycode', 'log_creditlimit']]\ny = df['statecode']\n\n# Train-test split\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n\nprint(f\"Training set: {len(X_train)}, Test set: {len(X_test)}\")\n```\n\n### Building a Classification Model\n```python\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.metrics import classification_report\n\n# Train model\nmodel = RandomForestClassifier(n_estimators=100)\nmodel.fit(X_train, y_train)\n\n# Evaluate\ny_pred = model.predict(X_test)\nprint(classification_report(y_test, y_pred))\n\n# Feature importance\nimportances = pd.Series(\n    model.feature_importances_,\n    index=X.columns\n).sort_values(ascending=False)\n\nprint(importances)\n```\n\n---\n\n## 8. Advanced DataFrame Operations\n\n### Custom Functions\n```python\n# Apply function to columns\ndf['name_length'] = df['name'].apply(len)\n\n# Apply function to rows\ndf['category'] = df.apply(\n    lambda row: 'High' if row['creditlimit'] \u003e 100000 else 'Low',\n    axis=1\n)\n\n# Conditional operations\ndf['adjusted_limit'] = df['creditlimit'].where(\n    df['statecode'] == 0,\n    df['creditlimit'] * 0.5\n)\n```\n\n### String Operations\n```python\n# String methods\ndf['name_upper'] = df['name'].str.upper()\ndf['name_starts'] = df['name'].str.startswith('A')\ndf['name_contains'] = df['name'].str.contains('Inc')\ndf['name_split'] = df['name'].str.split(',').str[0]\n\n# Replace and substitute\ndf['industry'] = df['industrycode'].map({\n    1: 'Retail',\n    2: 'Manufacturing',\n    3: 'Technology'\n})\n```\n\n### Reshaping Data\n```python\n# Transpose\ntransposed = df.set_index('name').T\n\n# Stack/Unstack\nstacked = df.set_index(['name', 'industrycode'])['creditlimit'].unstack()\n\n# Melt long format\nmelted = pd.melt(df, id_vars=['name'], var_name='metric', value_name='value')\n```\n\n---\n\n## 9. Performance Optimization\n\n### Efficient Data Loading\n```python\n# Load large datasets in chunks\nall_records = []\nchunk_size = 1000\n\nfor page in client.get(\n    \"account\",\n    select=[\"accountid\", \"name\", \"creditlimit\"],\n    top=10000,        # Limit total records\n    page_size=chunk_size\n):\n    all_records.extend(page)\n    if len(all_records) % 5000 == 0:\n        print(f\"Loaded {len(all_records)} records\")\n\ndf = pd.DataFrame(all_records)\nprint(f\"Total: {len(df)} records\")\n```\n\n### Memory Optimization\n```python\n# Reduce memory usage\n# Use categorical for repeated values\ndf['industrycode'] = df['industrycode'].astype('category')\n\n# Use appropriate numeric types\ndf['creditlimit'] = pd.to_numeric(df['creditlimit'], downcast='float')\n\n# Delete columns no longer needed\ndf = df.drop(columns=['unused_col1', 'unused_col2'])\n\n# Check memory usage\nprint(df.memory_usage(deep=True).sum() / 1024**2, \"MB\")\n```\n\n### Query Optimization\n```python\n# Apply filters on server, not client\n# ✅ GOOD: Filter on server\naccounts = client.get(\n    \"account\",\n    filter=\"creditlimit \u003e 50000\",  # Server-side filter\n    select=[\"accountid\", \"name\", \"creditlimit\"]\n)\n\n# ❌ BAD: Load all, filter locally\nall_accounts = client.get(\"account\")  # Loads everything\nfiltered = [a for a in all_accounts if a['creditlimit'] \u003e 50000]  # Client-side\n```\n\n---\n\n## 10. Complete Example: Sales Analytics\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom azure.identity import InteractiveBrowserCredential\nfrom PowerPlatform.Dataverse.client import DataverseClient\n\n# Setup\nclient = DataverseClient(\n    \"https://\u003cmyorg\u003e.crm.dynamics.com\",\n    InteractiveBrowserCredential()\n)\n\n# Load data\nprint(\"Loading account data...\")\nrecords = []\nfor page in client.get(\n    \"account\",\n    select=[\"accountid\", \"name\", \"creditlimit\", \"industrycode\", \"statecode\", \"createdon\"],\n    orderby=[\"createdon\"]\n):\n    records.extend(page)\n\ndf = pd.DataFrame(records)\ndf['createdon'] = pd.to_datetime(df['createdon'])\n\n# Data cleaning\ndf = df.dropna()\n\n# Feature engineering\ndf['year'] = df['createdon'].dt.year\ndf['month'] = df['createdon'].dt.month\ndf['year_month'] = df['createdon'].dt.to_period('M')\n\n# Analysis\nprint(\"\\n=== ACCOUNT OVERVIEW ===\")\nprint(f\"Total accounts: {len(df)}\")\nprint(f\"Total credit limit: ${df['creditlimit'].sum():,.2f}\")\nprint(f\"Average credit limit: ${df['creditlimit'].mean():,.2f}\")\n\nprint(\"\\n=== BY INDUSTRY ===\")\nindustry_summary = df.groupby('industrycode').agg({\n    'accountid': 'count',\n    'creditlimit': ['sum', 'mean']\n}).round(2)\nprint(industry_summary)\n\nprint(\"\\n=== BY STATUS ===\")\nstatus_summary = df.groupby('statecode').agg({\n    'accountid': 'count',\n    'creditlimit': 'sum'\n})\nprint(status_summary)\n\n# Export report\nprint(\"\\n=== EXPORTING REPORT ===\")\nindustry_summary.to_csv('industry_analysis.csv')\nprint(\"Report saved to industry_analysis.csv\")\n```\n\n---\n\n## 11. Known Limitations\n\n- `PandasODataClient` currently requires manual DataFrame creation from query results\n- Very large DataFrames (millions of rows) may experience memory constraints\n- Pandas operations are client-side; server-side aggregation is more efficient for large datasets\n- File operations require standard `DataverseClient`, not pandas wrapper\n\n---\n\n## 12. Related Resources\n\n- [Pandas Documentation](https://pandas.pydata.org/docs/)\n- [Official Example: quickstart_pandas.py](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/blob/main/examples/quickstart_pandas.py)\n- [SDK for Python README](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/blob/main/README.md)\n- [Microsoft Learn: Working with data](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/sdk-python/work-data)\n","description":"Guide to integrating the Dataverse SDK for Python with pandas DataFrames for data science and analysis workflows. The SDK's JSON response format maps seamlessly to pandas DataFrames, enabling data scientists to work with Dataverse data using familiar data manipulation tools.","import":{"commit_sha":"541b7819d8c3545c6df122491af4fa1eae415779","imported_at":"2026-05-18T20:05:35Z","license_text":"MIT License\n\nCopyright GitHub, Inc.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.","owner":"github","repo":"github/awesome-copilot","source_url":"https://github.com/github/awesome-copilot/blob/541b7819d8c3545c6df122491af4fa1eae415779/instructions/dataverse-python-pandas-integration.instructions.md"},"manifest":{}},"content_hash":[13,111,152,232,72,108,156,62,48,12,80,180,210,59,74,140,116,58,7,26,108,112,234,131,170,72,201,100,193,143,187,205],"trust_level":"unsigned","yanked":false}