{"kind":"AgentDefinition","metadata":{"namespace":"community","name":"power-bi-data-modeling-best-practices","version":"0.1.0"},"spec":{"agents_md":"---\ndescription: 'Comprehensive Power BI data modeling best practices based on Microsoft guidance for creating efficient, scalable, and maintainable semantic models using star schema principles.'\napplyTo: '**/*.{pbix,md,json,txt}'\n---\n\n# Power BI Data Modeling Best Practices\n\n## Overview\nThis document provides comprehensive instructions for designing efficient, scalable, and maintainable Power BI semantic models following Microsoft's official guidance and dimensional modeling best practices.\n\n## Star Schema Design Principles\n\n### 1. Fundamental Table Types\n**Dimension Tables** - Store descriptive business entities:\n- Products, customers, geography, time, employees\n- Contain unique key columns (preferably surrogate keys)\n- Relatively small number of rows\n- Used for filtering, grouping, and providing context\n- Support hierarchical drill-down scenarios\n\n**Fact Tables** - Store measurable business events:\n- Sales transactions, website clicks, manufacturing events\n- Contain foreign keys to dimension tables\n- Numeric measures for aggregation\n- Large number of rows (typically growing over time)\n- Represent specific grain/level of detail\n\n```\nExample Star Schema Structure:\n\nDimProduct (Dimension)          FactSales (Fact)              DimCustomer (Dimension)\n├── ProductKey (PK)             ├── SalesKey (PK)             ├── CustomerKey (PK)\n├── ProductName                 ├── ProductKey (FK)           ├── CustomerName\n├── Category                    ├── CustomerKey (FK)          ├── CustomerType  \n├── SubCategory                 ├── DateKey (FK)              ├── Region\n└── UnitPrice                   ├── SalesAmount               └── RegistrationDate\n                               ├── Quantity\nDimDate (Dimension)             └── DiscountAmount\n├── DateKey (PK)\n├── Date\n├── Year\n├── Quarter\n├── Month\n└── DayOfWeek\n```\n\n### 2. Table Design Best Practices\n\n#### Dimension Table Design\n```\n✅ DO:\n- Use surrogate keys (auto-incrementing integers) as primary keys\n- Include business keys for integration purposes\n- Create hierarchical attributes (Category \u003e SubCategory \u003e Product)\n- Use descriptive names and proper data types\n- Include \"Unknown\" records for missing dimension data\n- Keep dimension tables relatively narrow (focused attributes)\n\n❌ DON'T:\n- Use natural business keys as primary keys in large models\n- Mix fact and dimension characteristics in same table\n- Create unnecessarily wide dimension tables\n- Leave missing values without proper handling\n```\n\n#### Fact Table Design\n```\n✅ DO:\n- Store data at the most granular level needed\n- Use foreign keys that match dimension table keys\n- Include only numeric, measurable columns\n- Maintain consistent grain across all fact table rows\n- Use appropriate data types (decimal for currency, integer for counts)\n\n❌ DON'T:\n- Include descriptive text columns (these belong in dimensions)\n- Mix different grains in the same fact table\n- Store calculated values that can be computed at query time\n- Use composite keys when surrogate keys would be simpler\n```\n\n## Relationship Design and Management\n\n### 1. Relationship Types and Best Practices\n\n#### One-to-Many Relationships (Standard Pattern)\n```\nConfiguration:\n- From Dimension (One side) to Fact (Many side)\n- Single direction filtering (Dimension filters Fact)\n- Mark as \"Assume Referential Integrity\" for DirectQuery performance\n\nExample:\nDimProduct (1) ← ProductKey → (*) FactSales\nDimCustomer (1) ← CustomerKey → (*) FactSales\nDimDate (1) ← DateKey → (*) FactSales\n```\n\n#### Many-to-Many Relationships (Use Sparingly)\n```\nWhen to Use:\n✅ Genuine many-to-many business relationships\n✅ When bridging table pattern is not feasible\n✅ For advanced analytical scenarios\n\nBest Practices:\n- Create explicit bridging tables when possible\n- Use low-cardinality relationship columns\n- Monitor performance impact carefully\n- Document business rules clearly\n\nExample with Bridging Table:\nDimCustomer (1) ← CustomerKey → (*) BridgeCustomerAccount (*) ← AccountKey → (1) DimAccount\n```\n\n#### One-to-One Relationships (Rare)\n```\nWhen to Use:\n- Extending dimension tables with additional attributes\n- Degenerate dimension scenarios\n- Separating PII from operational data\n\nImplementation:\n- Consider consolidating into single table if possible\n- Use for security/privacy separation\n- Maintain referential integrity\n```\n\n### 2. Relationship Configuration Guidelines\n```\nFilter Direction:\n✅ Single Direction: Default choice, best performance\n✅ Both Directions: Only when cross-filtering is required for business logic\n❌ Avoid: Circular relationship paths\n\nCross-Filter Direction:\n- Dimension to Fact: Always single direction\n- Fact to Fact: Avoid direct relationships, use shared dimensions\n- Dimension to Dimension: Only when business logic requires it\n\nReferential Integrity:\n✅ Enable for DirectQuery sources when data quality is guaranteed  \n✅ Improves query performance by using INNER JOINs\n❌ Don't enable if source data has orphaned records\n```\n\n## Storage Mode Optimization\n\n### 1. Import Mode Best Practices\n```\nWhen to Use Import Mode:\n✅ Data size fits within capacity limits\n✅ Complex analytical calculations required\n✅ Historical data analysis with stable datasets\n✅ Need for optimal query performance\n\nOptimization Strategies:\n- Remove unnecessary columns and rows\n- Use appropriate data types\n- Pre-aggregate data when possible\n- Implement incremental refresh for large datasets\n- Optimize Power Query transformations\n```\n\n#### Data Reduction Techniques for Import\n```\nVertical Filtering (Column Reduction):\n✅ Remove columns not used in reports or relationships\n✅ Remove calculated columns that can be computed in DAX\n✅ Remove intermediate columns used only in Power Query\n✅ Optimize data types (Integer vs. Decimal, Date vs. DateTime)\n\nHorizontal Filtering (Row Reduction):\n✅ Filter to relevant time periods (e.g., last 3 years of data)\n✅ Filter to relevant business entities (active customers, specific regions)\n✅ Remove test, invalid, or cancelled transactions\n✅ Implement proper data archiving strategies\n\nData Type Optimization:\nText → Numeric: Convert codes to integers when possible\nDateTime → Date: Use Date type when time is not needed\nDecimal → Integer: Use integers for whole number measures\nHigh Precision → Lower Precision: Match business requirements\n```\n\n### 2. DirectQuery Mode Best Practices\n```\nWhen to Use DirectQuery Mode:\n✅ Data exceeds import capacity limits\n✅ Real-time data requirements\n✅ Security/compliance requires data to stay at source\n✅ Integration with operational systems\n\nOptimization Requirements:\n- Optimize source database performance\n- Create appropriate indexes on source tables\n- Minimize complex DAX calculations\n- Use simple measures and aggregations\n- Limit number of visuals per report page\n- Implement query reduction techniques\n```\n\n#### DirectQuery Performance Optimization\n```\nDatabase Optimization:\n✅ Create indexes on frequently filtered columns\n✅ Create indexes on relationship key columns\n✅ Use materialized views for complex joins\n✅ Implement appropriate database maintenance\n✅ Consider columnstore indexes for analytical workloads\n\nModel Design for DirectQuery:\n✅ Keep DAX measures simple\n✅ Avoid calculated columns on large tables\n✅ Use star schema design strictly\n✅ Minimize cross-table operations\n✅ Pre-aggregate data in source when possible\n\nQuery Performance:\n✅ Apply filters early in report design\n✅ Use appropriate visual types\n✅ Limit high-cardinality filtering\n✅ Monitor and optimize slow queries\n```\n\n### 3. Composite Model Design\n```\nWhen to Use Composite Models:\n✅ Combine historical (Import) with real-time (DirectQuery) data\n✅ Extend existing models with additional data sources\n✅ Balance performance with data freshness requirements\n✅ Integrate multiple DirectQuery sources\n\nStorage Mode Selection:\nImport: Small dimension tables, historical aggregated facts\nDirectQuery: Large fact tables, real-time operational data  \nDual: Dimension tables that need to work with both Import and DirectQuery facts\nHybrid: Fact tables combining historical (Import) with recent (DirectQuery) data\n```\n\n#### Dual Storage Mode Strategy\n```\nUse Dual Mode For:\n✅ Dimension tables that relate to both Import and DirectQuery facts\n✅ Small, slowly changing reference tables\n✅ Lookup tables that need flexible querying\n\nConfiguration:\n- Set dimension tables to Dual mode\n- Power BI automatically chooses optimal query path\n- Maintains single copy of dimension data\n- Enables efficient cross-source relationships\n```\n\n## Advanced Modeling Patterns\n\n### 1. Date Table Design\n```\nEssential Date Table Attributes:\n✅ Continuous date range (no gaps)\n✅ Mark as date table in Power BI\n✅ Include standard hierarchy (Year \u003e Quarter \u003e Month \u003e Day)\n✅ Add business-specific columns (FiscalYear, WorkingDay, Holiday)\n✅ Use Date data type for date column\n\nDate Table Implementation:\nDateKey (Integer): 20240315 (YYYYMMDD format)\nDate (Date): 2024-03-15\nYear (Integer): 2024\nQuarter (Text): Q1 2024\nMonth (Text): March 2024  \nMonthNumber (Integer): 3\nDayOfWeek (Text): Friday\nIsWorkingDay (Boolean): TRUE\nFiscalYear (Integer): 2024\nFiscalQuarter (Text): FY2024 Q3\n```\n\n### 2. Slowly Changing Dimensions (SCD)\n```\nType 1 SCD (Overwrite):\n- Update existing records with new values\n- Lose historical context\n- Simple to implement and maintain\n- Use for non-critical attribute changes\n\nType 2 SCD (History Preservation):\n- Create new records for changes\n- Maintain complete history\n- Include effective date ranges\n- Use surrogate keys for unique identification\n\nImplementation Pattern:\nCustomerKey (Surrogate): 1, 2, 3, 4\nCustomerID (Business): 101, 101, 102, 103  \nCustomerName: \"John Doe\", \"John Smith\", \"Jane Doe\", \"Bob Johnson\"\nEffectiveDate: 2023-01-01, 2024-01-01, 2023-01-01, 2023-01-01\nExpirationDate: 2023-12-31, 9999-12-31, 9999-12-31, 9999-12-31\nIsCurrent: FALSE, TRUE, TRUE, TRUE\n```\n\n### 3. Role-Playing Dimensions\n```\nScenario: Date table used for Order Date, Ship Date, Delivery Date\n\nImplementation Options:\n\nOption 1: Multiple Relationships (Recommended)\n- Single Date table with multiple relationships to Fact\n- One active relationship (Order Date)\n- Inactive relationships for Ship Date and Delivery Date\n- Use USERELATIONSHIP in DAX measures\n\nOption 2: Multiple Date Tables\n- Separate tables: OrderDate, ShipDate, DeliveryDate\n- Each with dedicated relationship\n- More intuitive for report authors\n- Larger model size due to duplication\n\nDAX Implementation:\nSales by Order Date = [Total Sales]  // Uses active relationship\nSales by Ship Date = CALCULATE([Total Sales], USERELATIONSHIP(FactSales[ShipDate], DimDate[Date]))\nSales by Delivery Date = CALCULATE([Total Sales], USERELATIONSHIP(FactSales[DeliveryDate], DimDate[Date]))\n```\n\n### 4. Bridge Tables for Many-to-Many\n```\nScenario: Students can be in multiple Courses, Courses can have multiple Students\n\nBridge Table Design:\nDimStudent (1) ← StudentKey → (*) BridgeStudentCourse (*) ← CourseKey → (1) DimCourse\n\nBridge Table Structure:\nStudentCourseKey (PK): Surrogate key\nStudentKey (FK): Reference to DimStudent\nCourseKey (FK): Reference to DimCourse  \nEnrollmentDate: Additional context\nGrade: Additional context\nStatus: Active, Completed, Dropped\n\nRelationship Configuration:\n- DimStudent to BridgeStudentCourse: One-to-Many\n- BridgeStudentCourse to DimCourse: Many-to-One  \n- Set one relationship to bi-directional for filter propagation\n- Hide bridge table from report view\n```\n\n## Performance Optimization Strategies\n\n### 1. Model Size Optimization\n```\nColumn Optimization:\n✅ Remove unused columns completely\n✅ Use smallest appropriate data types\n✅ Convert high-cardinality text to integers with lookup tables\n✅ Remove redundant calculated columns\n\nRow Optimization:  \n✅ Filter to business-relevant time periods\n✅ Remove invalid, test, or cancelled transactions\n✅ Archive historical data appropriately\n✅ Use incremental refresh for growing datasets\n\nAggregation Strategies:\n✅ Pre-calculate common aggregations\n✅ Use summary tables for high-level reporting\n✅ Implement automatic aggregations in Premium\n✅ Consider OLAP cubes for complex analytical requirements\n```\n\n### 2. Relationship Performance\n```\nKey Selection:\n✅ Use integer keys over text keys\n✅ Prefer surrogate keys over natural keys\n✅ Ensure referential integrity in source data\n✅ Create appropriate indexes on key columns\n\nCardinality Optimization:\n✅ Set correct relationship cardinality\n✅ Use \"Assume Referential Integrity\" when appropriate\n✅ Minimize bidirectional relationships\n✅ Avoid many-to-many relationships when possible\n\nCross-Filtering Strategy:\n✅ Use single-direction filtering as default\n✅ Enable bi-directional only when required\n✅ Test performance impact of cross-filtering\n✅ Document business reasons for bi-directional relationships\n```\n\n### 3. Query Performance Patterns\n```\nEfficient Model Patterns:\n✅ Proper star schema implementation\n✅ Normalized dimension tables\n✅ Denormalized fact tables\n✅ Consistent grain across related tables\n✅ Appropriate use of calculated tables and columns\n\nQuery Optimization:\n✅ Pre-filter large datasets\n✅ Use appropriate visual types for data\n✅ Minimize complex DAX in reports\n✅ Leverage model relationships effectively\n✅ Consider DirectQuery for large, real-time datasets\n```\n\n## Security and Governance\n\n### 1. Row-Level Security (RLS)\n```\nImplementation Patterns:\n\nUser-Based Security:\n[UserEmail] = USERPRINCIPALNAME()\n\nRole-Based Security:  \nVAR UserRole = \n    LOOKUPVALUE(\n        UserRoles[Role],\n        UserRoles[Email],\n        USERPRINCIPALNAME()\n    )\nRETURN\n    Customers[Region] = UserRole\n\nDynamic Security:\nLOOKUPVALUE(\n    UserRegions[Region],\n    UserRegions[Email], \n    USERPRINCIPALNAME()\n) = Customers[Region]\n\nBest Practices:\n✅ Test with different user accounts\n✅ Keep security logic simple and performant\n✅ Document security requirements clearly\n✅ Use security roles, not individual user filters\n✅ Consider performance impact of complex RLS\n```\n\n### 2. Data Governance\n```\nDocumentation Requirements:\n✅ Business definitions for all measures\n✅ Data lineage and source system mapping\n✅ Refresh schedules and dependencies\n✅ Security and access control documentation\n✅ Change management procedures\n\nData Quality:\n✅ Implement data validation rules\n✅ Monitor for data completeness\n✅ Handle missing values appropriately\n✅ Validate business rule implementation\n✅ Regular data quality assessments\n\nVersion Control:\n✅ Source control for Power BI files\n✅ Environment promotion procedures\n✅ Change tracking and approval processes\n✅ Backup and recovery procedures\n```\n\n## Testing and Validation Framework\n\n### 1. Model Testing Checklist\n```\nFunctional Testing:\n□ All relationships function correctly\n□ Measures calculate expected values\n□ Filters propagate appropriately\n□ Security rules work as designed\n□ Data refresh completes successfully\n\nPerformance Testing:\n□ Model loads within acceptable time\n□ Queries execute within SLA requirements\n□ Visual interactions are responsive\n□ Memory usage is within capacity limits\n□ Concurrent user load testing completed\n\nData Quality Testing:\n□ No missing foreign key relationships\n□ Measure totals match source system\n□ Date ranges are complete and continuous\n□ Security filtering produces correct results\n□ Business rules are correctly implemented\n```\n\n### 2. Validation Procedures\n```\nBusiness Validation:\n✅ Compare report totals with source systems\n✅ Validate complex calculations with business users\n✅ Test edge cases and boundary conditions\n✅ Confirm business logic implementation\n✅ Verify report accuracy across different filters\n\nTechnical Validation:\n✅ Performance testing with realistic data volumes\n✅ Concurrent user testing\n✅ Security testing with different user roles\n✅ Data refresh testing and monitoring\n✅ Disaster recovery testing\n```\n\n## Common Anti-Patterns to Avoid\n\n### 1. Schema Anti-Patterns\n```\n❌ Snowflake Schema (Unless Necessary):\n- Multiple normalized dimension tables\n- Complex relationship chains\n- Reduced query performance\n- More complex for business users\n\n❌ Single Large Table:\n- Mixing facts and dimensions\n- Denormalized to extreme\n- Difficult to maintain and extend\n- Poor performance for analytical queries\n\n❌ Multiple Fact Tables with Direct Relationships:\n- Many-to-many between facts\n- Complex filter propagation\n- Difficult to maintain consistency\n- Better to use shared dimensions\n```\n\n### 2. Relationship Anti-Patterns  \n```\n❌ Bidirectional Relationships Everywhere:\n- Performance impact\n- Unpredictable filter behavior\n- Maintenance complexity\n- Should be exception, not rule\n\n❌ Many-to-Many Without Business Justification:\n- Often indicates missing dimension\n- Can hide data quality issues\n- Complex debugging and maintenance\n- Bridge tables usually better solution\n\n❌ Circular Relationships:\n- Ambiguous filter paths\n- Unpredictable results\n- Difficult debugging\n- Always avoid through proper design\n```\n\n## Advanced Data Modeling Patterns\n\n### 1. Slowly Changing Dimensions Implementation\n```powerquery\n// Type 1 SCD: Power Query implementation for hash-based change detection\nlet\n    Source = Source,\n\n    #\"Added custom\" = Table.TransformColumnTypes(\n        Table.AddColumn(Source, \"Hash\", each Binary.ToText( \n            Text.ToBinary( \n                Text.Combine(\n                    List.Transform({[FirstName],[LastName],[Region]}, each if _ = null then \"\" else _),\n                \"|\")),\n            BinaryEncoding.Hex)\n        ),\n        {{\"Hash\", type text}}\n    ),\n\n    #\"Marked key columns\" = Table.AddKey(#\"Added custom\", {\"Hash\"}, false),\n\n    #\"Merged queries\" = Table.NestedJoin(\n        #\"Marked key columns\",\n        {\"Hash\"},\n        ExistingDimRecords,\n        {\"Hash\"},\n        \"ExistingDimRecords\",\n        JoinKind.LeftOuter\n    ),\n\n    #\"Expanded ExistingDimRecords\" = Table.ExpandTableColumn(\n        #\"Merged queries\",\n        \"ExistingDimRecords\",\n        {\"Count\"},\n        {\"Count\"}\n    ),\n\n    #\"Filtered rows\" = Table.SelectRows(#\"Expanded ExistingDimRecords\", each ([Count] = null)),\n\n    #\"Removed columns\" = Table.RemoveColumns(#\"Filtered rows\", {\"Count\"})\nin\n    #\"Removed columns\"\n```\n\n### 2. Incremental Refresh with Query Folding\n```powerquery\n// Optimized incremental refresh pattern\nlet\n  Source = Sql.Database(\"server\",\"database\"),\n  Data  = Source{[Schema=\"dbo\",Item=\"FactInternetSales\"]}[Data],\n  FilteredByStart = Table.SelectRows(Data, each [OrderDateKey] \u003e= Int32.From(DateTime.ToText(RangeStart,[Format=\"yyyyMMdd\"]))),\n  FilteredByEnd = Table.SelectRows(FilteredByStart, each [OrderDateKey] \u003c Int32.From(DateTime.ToText(RangeEnd,[Format=\"yyyyMMdd\"])))\nin\n  FilteredByEnd\n```\n\n### 3. Semantic Link Integration\n```python\n# Working with Power BI semantic models in Python\nimport sempy.fabric as fabric\nfrom sempy.relationships import plot_relationship_metadata\n\nrelationships = fabric.list_relationships(\"my_dataset\")\nplot_relationship_metadata(relationships)\n```\n\n### 4. Advanced Partition Strategies\n```json\n// TMSL partition with time-based filtering\n\"partition\": {\n      \"name\": \"Sales2019\",\n      \"mode\": \"import\",\n      \"source\": {\n        \"type\": \"m\",\n        \"expression\": [\n          \"let\",\n          \"    Source = SqlDatabase,\",\n          \"    dbo_Sales = Source{[Schema=\\\"dbo\\\",Item=\\\"Sales\\\"]}[Data],\",\n          \"    FilteredRows = Table.SelectRows(dbo_Sales, each [OrderDateKey] \u003e= 20190101 and [OrderDateKey] \u003c= 20191231)\",\n          \"in\",\n          \"    FilteredRows\"\n        ]\n      }\n}\n```\n\nRemember: Always validate your model design with business users and test with realistic data volumes and usage patterns. Use Power BI's built-in tools like Performance Analyzer and DAX Studio for optimization and debugging.","description":"Comprehensive Power BI data modeling best practices based on Microsoft guidance for creating efficient, scalable, and maintainable semantic models using star schema principles.","import":{"commit_sha":"541b7819d8c3545c6df122491af4fa1eae415779","imported_at":"2026-05-18T20:05:35Z","license_text":"MIT License\n\nCopyright GitHub, Inc.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.","owner":"github","repo":"github/awesome-copilot","source_url":"https://github.com/github/awesome-copilot/blob/541b7819d8c3545c6df122491af4fa1eae415779/instructions/power-bi-data-modeling-best-practices.instructions.md"},"manifest":{}},"content_hash":[133,122,217,124,144,27,200,150,230,86,144,117,153,237,73,230,24,116,142,191,46,249,157,223,0,122,234,229,249,234,249,42],"trust_level":"unsigned","yanked":false}
