dartframe 0.8.4 changelog

0.8.4 #

[MAJOR FEATURE] Window Functions - Exponentially Weighted Moving (EWM) Operations
- NEW: DataFrame.ewm() - Create exponentially weighted window with span, alpha, halflife, or com parameters
- NEW: ewm().mean() - Exponentially weighted moving average for smoothing time series data
- NEW: ewm().std() - Exponentially weighted moving standard deviation for volatility analysis
- NEW: ewm().var_() - Exponentially weighted moving variance for risk measurement
- NEW: ewm().corr() - Exponentially weighted moving correlation (pairwise and with other DataFrame)
- NEW: ewm().cov() - Exponentially weighted moving covariance (pairwise and with other DataFrame)
- ENHANCEMENT: Support for adjustWeights and ignoreNA parameters for flexible weighting schemes
- COMPATIBILITY: Pandas-like API for familiar exponential smoothing workflows
[MAJOR FEATURE] Window Functions - Expanding Window Operations
- NEW: DataFrame.expanding() - Create expanding window with minPeriods parameter
- NEW: expanding().mean() - Expanding mean (cumulative average) for running statistics
- NEW: expanding().sum() - Expanding sum (cumulative sum) for accumulation analysis
- NEW: expanding().std() - Expanding standard deviation for growing window volatility
- NEW: expanding().min() - Expanding minimum (running minimum) for tracking lowest values
- NEW: expanding().max() - Expanding maximum (running maximum) for tracking highest values
- ENHANCEMENT: All expanding operations support minPeriods parameter for minimum observation requirements
[FEATURE] DataFrame Statistical Methods - Data Manipulation Operations
- NEW: DataFrame.clip() - Trim values at input thresholds with lower/upper bounds for outlier control
- NEW: DataFrame.abs() - Compute absolute values for all numeric columns
- NEW: DataFrame.pctChange() - Calculate percentage change between consecutive rows for growth analysis
- NEW: DataFrame.diff() - Calculate first discrete difference between consecutive rows
- NEW: DataFrame.idxmax() - Return index labels of maximum values for each column
- NEW: DataFrame.idxmin() - Return index labels of minimum values for each column
- NEW: DataFrame.qcut() - Quantile-based discretization for specified columns into equal-sized bins
- ENHANCEMENT: Enhanced DataFrame.round() with parameter validation and error handling
[FEATURE] Series Statistical Methods
- NEW: Series.clip() - Trim values at input thresholds with lower/upper bounds
- FIX: Resolved duplicate abs() method causing ambiguity errors in Series extensions
- FIX: Fixed Series extension methods not working on Series<dynamic> by proper type handling
[MAJOR FEATURE] GroupBy Enhancements - Advanced Aggregation and Operations
- NEW: DataFrame.groupBy2() - Create GroupBy object for advanced operations with chainable API
- NEW: GroupBy.transform() - Transform values within groups while maintaining original DataFrame shape
- NEW: GroupBy.filter() - Filter entire groups based on conditions for group-level selection
- NEW: GroupBy.pipe() - Apply chainable functions for method chaining and custom operations
- NEW: GroupBy.nth() - Get nth row from each group (supports negative indexing)
- NEW: GroupBy.head() / GroupBy.tail() - Get first/last n rows from each group
- NEW: Cumulative Operations within Groups:
  - GroupBy.cumsum() - Cumulative sum within each group for running totals
  - GroupBy.cumprod() - Cumulative product within each group
  - GroupBy.cummax() - Cumulative maximum within each group
  - GroupBy.cummin() - Cumulative minimum within each group
- NEW: Flexible Aggregation with Enhanced GroupBy.agg():
  - Single function mode: agg('sum') for simple aggregations
  - Multiple functions mode: agg(['sum', 'mean', 'count']) for multiple statistics
  - Column-specific mode: agg({'col1': 'sum', 'col2': ['mean', 'max']}) for targeted aggregations
  - Named aggregations mode: agg({'total': NamedAgg('amount', 'sum')}) for custom column names
- NEW: NamedAgg class for custom aggregation column names and multiple function support
- NEW: Convenience aggregation methods: sum(), mean(), count(), min(), max(), std(), var_(), first(), last()
- NEW: Utility methods: ngroups property, size() method, groups property for group inspection
- FIX: Fixed list equality issue in groupBy() when using multiple columns - now uses string representation for proper map key equality
- ARCHITECTURE: Lazy evaluation in GroupBy operations for memory efficiency
[FEATURE] Advanced Slicing Methods (Previously Implemented)
- NEW: DataFrame.slice() - Slice with step parameter (forward and reverse slicing)
- NEW: DataFrame.sliceByLabel() - Label-based range slicing (inclusive endpoints)
- NEW: DataFrame.sliceByPosition() - Combined position slicing with step parameter
- NEW: DataFrame.sliceByLabelWithStep() - Label + step combination for flexible slicing
- NEW: DataFrame.everyNthRow() / DataFrame.everyNthColumn() - Convenience sampling methods
- NEW: DataFrame.reverseRows() / DataFrame.reverseColumns() - Simple reversal operations
[ENHANCEMENT] Method Chaining Support
- All new operations return DataFrames/Series for seamless method chaining
- Example: df.clip(lower: 0, upper: 100).abs().round(2) for fluent API usage
- Consistent API design across all statistical and manipulation methods
[ENHANCEMENT] Null Value Handling
- Consistent null value handling across all new operations
- Null values are preserved appropriately in all transformations
- Graceful handling of edge cases (empty DataFrames, single rows, mixed types)
[ENHANCEMENT] Performance Optimizations
- Efficient O(n) and O(n*m) implementations for all new methods
- Lazy evaluation in GroupBy operations for reduced memory footprint
- Memory-efficient implementations suitable for large datasets (1000+ rows tested)
- Performance targets: < 1 second for typical operations
[ENHANCEMENT] Error Handling
- Comprehensive parameter validation with descriptive error messages
- Proper type checking and conversion for mixed data types
- Clear error messages for invalid operations and edge cases
[MAJOR FEATURE] Data Type System - Nullable Types and Type Management
- NEW: Comprehensive DType system with nullable integer, boolean, and string types
  - Int8DType, Int16DType, Int32DType, Int64DType - Nullable integer types with range validation (-128 to 127, -32768 to 32767, etc.)
  - BooleanDType - Nullable boolean with flexible string parsing ('true', 'yes', '1', 'false', 'no', '0')
  - StringDType - Nullable string with optional max length constraints
  - Float32DType, Float64DType - Nullable float types with NaN handling
  - DateTimeDType - Nullable datetime with string and timestamp parsing
  - ObjectDType - Generic object type for mixed data
- NEW: DTypeRegistry - Custom data type registration system
  - register(name, constructor) - Register custom types with string names
  - get(name) - Retrieve registered types by name
  - Built-in type lookup with fallback to custom types
  - Type validation and management
- NEW: DTypes convenience class - Easy type creation
  - DTypes.int8(), DTypes.int16(), DTypes.int32(), DTypes.int64() - Integer type constructors
  - DTypes.float32(), DTypes.float64() - Float type constructors
  - DTypes.boolean(), DTypes.string(), DTypes.datetime() - Other type constructors
  - All types support nullable/non-nullable variants via nullable parameter
- NEW: DataFrame.dtypesDetailed - Automatic type inference
  - Detects optimal types based on data content and value ranges
  - Chooses smallest integer type that fits the data range (Int8 for [-128, 127], etc.)
  - Handles nullable vs non-nullable type detection
  - Smart string parsing: infers numeric types from parsable strings (e.g., '123' → Int8DType)
- NEW: DataFrame.astype() - Enhanced type conversion with categorical support
  - Convert columns to specific DType objects: df.astype({'col': DTypes.int32()})
  - Support for Map<String, DType> and Map<String, String> formats
  - Error handling modes: 'raise', 'ignore', 'coerce' for flexible conversion
  - CATEGORICAL SUPPORT: df.astype({'col': 'category'}) delegates to existing categorical system
  - Automatic fallback for categorical and other existing types
  - Full compatibility with existing astype() behavior
- NEW: DataFrame.inferDTypes() - Automatic type optimization
  - Infer and convert to optimal types automatically
  - Downcast options: 'integer', 'float', 'all' for memory optimization
  - Smart string-to-number inference for data cleaning
  - Reduces memory usage by selecting smallest appropriate types
- NEW: DataFrame.memoryUsageByDType() - Memory usage analysis
  - Calculate memory usage per column based on dtype
  - Fixed-size type calculations (Int8=1 byte, Int16=2 bytes, etc.)
  - Variable-size type estimation for strings and objects
  - Returns Map<String, int> of column names to bytes
- NEW: Series.dtypeInfo - Series type information
  - Get DType information for Series data
  - Automatic type inference with range-based integer selection
  - Avoids conflict with existing dtype property
- NEW: Series.astype() - Series type conversion with categorical support
  - Convert Series to specific DType: s.astype(DTypes.int8())
  - CATEGORICAL SUPPORT: s.astype('category', categories: [...], ordered: true)
  - In-place conversion for categorical type (modifies original Series)
  - Returns new Series for other type conversions
  - Support for 'int', 'float', 'string', 'object' string names
  - Error handling: 'raise', 'ignore', 'coerce' modes
  - Full compatibility with existing categorical system
- NEW: Series.memoryUsageByDType() - Series memory analysis
  - Calculate memory usage based on inferred dtype
  - Range-based integer type detection for accurate estimates
- NEW: Series public methods for dtype management
  - toCategorical(categories, ordered) - Convert to categorical type
  - setDType(dtype) - Set dtype string identifier
  - clearCategorical() - Clear categorical data
- ENHANCEMENT: Smart Type Inference
  - Parses string content to infer numeric types: ['1', '2', '3'] → Int8DType
  - Range-based integer type selection: values [1,2,3] → Int8, [1000,2000] → Int16
  - Automatic detection of parsable integers and floats in string data
- ENHANCEMENT: Robust Error Handling
  - All numeric and datetime types throw FormatException for unparsable strings
  - Proper exception propagation with 'raise' error mode
  - Null coercion with 'coerce' error mode
  - Value preservation with 'ignore' error mode
- COMPATIBILITY: Works alongside existing type system
  - Categorical conversion delegates to existing _Categorical implementation
  - No breaking changes to existing DataFrame/Series behavior
  - Extension-based implementation for clean separation of concerns
- PERFORMANCE: Memory-efficient type storage and conversion
  - Optimal type selection reduces memory footprint
  - Efficient conversion algorithms
  - Lazy evaluation where appropriate
- VALIDATION: Comprehensive type validation and range checking
  - Integer range validation (Int8: -128 to 127, Int16: -32768 to 32767, etc.)
  - Type compatibility checks
  - Null handling for nullable types
- TESTING: 52 comprehensive tests (22 dtype + 30 categorical)
  - Full integration testing with DataFrame and Series
  - Categorical compatibility testing
  - Error handling and edge case coverage
  - Memory usage calculation verification
[MAJOR FEATURE] Database Support - SQL Database Integration
- NEW: DatabaseConnection abstract interface for database operations
  - query() - Execute SQL queries and return DataFrame
  - execute() - Execute SQL commands (INSERT, UPDATE, DELETE) and return affected rows
  - executeBatch() - Execute multiple SQL commands efficiently
  - beginTransaction() - Start database transactions for ACID compliance
  - close() - Close database connection
  - isConnected() - Check connection status
  - databaseType - Get database type identifier
- NEW: DatabaseTransaction interface for transaction management
  - query() - Execute queries within transaction
  - execute() - Execute commands within transaction
  - commit() - Commit transaction changes
  - rollback() - Rollback transaction on errors
- NEW: ConnectionPool class for efficient connection management
  - getConnection() - Get connection from pool
  - releaseConnection() - Return connection to pool
  - close() - Close all pooled connections
  - activeConnectionCount - Monitor active connections
  - availableConnectionCount - Monitor available connections
  - Configurable max connections (default: 5)
  - Automatic connection lifecycle management
- NEW: Database-specific implementations
  - SQLiteConnection - SQLite database support
  - PostgreSQLConnection - PostgreSQL database support
  - MySQLConnection - MySQL database support
  - Each with transaction support and batch operations
- NEW: DatabaseReader utility class
  - readSqlQuery() - Read SQL query results into DataFrame (pandas-like read_sql_query)
  - readSqlTable() - Read entire SQL table into DataFrame (pandas-like read_sql_table)
  - createConnection() - Factory method for creating database connections
  - Support for WHERE clauses, LIMIT, OFFSET, column selection
- NEW: DataFrame.toSql() extension method (pandas-like to_sql)
  - Write DataFrame to SQL database tables
  - ifExists modes: 'fail', 'replace', 'append' for table handling
  - Automatic table creation with type inference
  - Custom data type mapping via dtype parameter
  - Chunked inserts for large datasets (configurable chunkSize)
  - Index column support with custom labels
  - Automatic SQL type inference from Dart types
- FEATURE: Parameterized Queries
  - All query methods support parameters argument
  - SQL injection prevention through parameter binding
  - Type-safe parameter handling
- FEATURE: Batch Operations
  - executeBatch() for bulk inserts/updates
  - Optimized for high-performance bulk operations
  - Reduces database round-trips
- FEATURE: Transaction Support
  - Full ACID compliance with begin/commit/rollback
  - Automatic rollback on errors
  - Nested transaction prevention
  - Transaction state management
- ARCHITECTURE: Production-ready structure
  - Abstract interfaces for easy extension
  - Mock implementations for testing
  - Ready for real database driver integration (sqflite, postgres, mysql1)
  - Separation of concerns with transaction classes
- ERROR HANDLING: Comprehensive exception types
  - DatabaseConnectionError - Connection failures
  - DatabaseQueryError - Query execution failures
  - DatabaseTransactionError - Transaction failures
  - UnsupportedDatabaseError - Unsupported database types
- EXAMPLES: Complete example files
  - example/database_example.dart - Mock examples with 10 scenarios
  - example/database_real_example.dart - Real database examples (SQLite Northwind, PostgreSQL PostGIS)
  - DATABASE_SETUP.md - Comprehensive setup guide
- COMPATIBILITY: Pandas-like API for familiar usage patterns
[DOCUMENTATION] Comprehensive Documentation Added
- WINDOW_FUNCTIONS_SUMMARY.md - Complete window functions documentation with examples
- EWM_CORR_COV_SUMMARY.md - EWM correlation and covariance documentation
- GROUPBY_ENHANCEMENTS_SUMMARY.md - GroupBy operations documentation with use cases
- ADVANCED_SLICING_SUMMARY.md - Advanced slicing documentation
- CLIP_ABS_SUMMARY.md - DataFrame operations documentation
- SERIES_CLIP_SUMMARY.md - Series clip documentation
- DATAFRAME_COMPLETE_METHODS_SUMMARY.md - Complete statistical methods documentation
- example/window_functions_example.dart - Window functions usage examples
- example/dataframe_operations_example.dart - DataFrame operations examples
- example/series_clip_example.dart - Series clip usage examples
- example/dtype_example.dart - Comprehensive DType system examples with 10 usage scenarios
[TESTING] Comprehensive Test Coverage - 221+ New Tests Added
- 47 tests for window functions (test/window_functions_test.dart)
- 47 tests for GroupBy operations (test/groupby_enhanced_test.dart)
- 44 tests for advanced slicing (test/advanced_slicing_test.dart)
- 36 tests for DataFrame operations (test/dataframe_operations_test.dart)
- 28 tests for missing DataFrame methods (test/dataframe_missing_methods_test.dart)
- 19 tests for Series clip (test/series_clip_test.dart)
- Edge case coverage: empty DataFrames/Series, single row/column, mixed types, null values
- Large dataset performance validation (1000+ rows)
[FEATURE] Export Formats - Multiple Output Formats
- NEW: toLatex() - Export DataFrame to LaTeX table format
  - Support for captions, labels, and position specifiers
  - Automatic escaping of special LaTeX characters
  - Longtable environment for multi-page tables
  - Custom column format strings
  - Bold headers and configurable styling
- NEW: toMarkdown() - Export DataFrame to Markdown table format
  - GitHub-flavored markdown (pipe format)
  - Grid and simple table formats
  - Column alignment options (left, center, right)
  - Float formatting for numeric precision
  - Maximum column width for truncation
- NEW: toStringFormatted() - Enhanced formatted string representation
  - Intelligent truncation for large DataFrames
  - Configurable max rows and columns
  - Float formatting support
  - Shape information footer
  - Pandas-like display with ellipsis
- NEW: toRecords() - Convert DataFrame to list of record maps
  - Optional index inclusion
  - Custom index column naming
  - Perfect for JSON serialization
  - Row-by-row iteration support
- COMPATIBILITY: Pandas-like API for familiar export workflows
[FEATURE] Web & API - HTML and XML Support
- NEW: toHtml() - Export DataFrame to HTML table format
  - CSS classes and table ID support
  - Notebook styling for Jupyter-like display
  - Configurable borders and alignment
  - Automatic HTML entity escaping
  - Truncation for large DataFrames
  - Dimension display footer
- NEW: toXml() - Export DataFrame to XML format
  - Custom root and row element names
  - Attribute and element column modes
  - XML entity escaping
  - Pretty print with indentation
  - Index inclusion control
- NEW: DataFrame.readHtml() - Read HTML tables from string
  - Automatic table detection and parsing
  - Header row specification
  - Numeric value parsing
  - HTML entity decoding
  - Multiple table support
- NEW: DataFrame.readXml() - Read XML data into DataFrame
  - Custom row element selection
  - Attribute extraction with prefix
  - Numeric value parsing
  - XML entity decoding
  - Flexible column detection
- COMPATIBILITY: Round-trip support for HTML and XML formats
[API COMPLETENESS] Feature Parity Achievement
- DataFrame and Series now have complete statistical method coverage
- All 11 statistical methods implemented for both DataFrame and Series
- Pandas-like API with consistent method signatures across all operations
- Full support for method chaining and functional programming patterns
- Complete export format support for documentation and reporting

0.8.3 #

[MAJOR FEATURE] Comprehensive String Operations Extension
- NEW: Pattern extraction methods - str.extract(), str.extractall(), str.findall() for regex-based text processing
- NEW: String padding and justification - str.pad(), str.center(), str.ljust(), str.rjust(), str.zfill() for text alignment
- NEW: String slicing and manipulation - str.slice(), str.get() for advanced substring operations
- NEW: String concatenation and repetition - str.cat(), str.repeat() for text composition
- NEW: String type checking methods - str.isalnum(), str.isalpha(), str.isdigit(), str.isspace(), str.islower(), str.isupper(), str.istitle(), str.isnumeric(), str.isdecimal() for character validation
- COMPATIBILITY: Pandas-like string accessor API for familiar text operations
[MAJOR FEATURE] Enhanced Categorical Data Operations
- NEW: cat.reorderCategories() - Reorder category levels with ordering control
- NEW: cat.addCategories() - Add new categories to existing categorical data
- NEW: cat.removeCategories() - Remove unused categories with validation
- NEW: cat.renameCategories() - Rename categories using mapping dictionaries
- NEW: cat.setCategories() - Set categories with recode and rename modes
- NEW: cat.asOrdered() / cat.asUnordered() - Convert between ordered and unordered categorical types
- NEW: cat.min() / cat.max() - Min/max operations for ordered categories
- NEW: cat.memoryUsage() - Memory usage analysis and optimization metrics for categorical storage
- ENHANCEMENT: All categorical operations integrated with CategoricalAccessor interface
[FEATURE] DataFrame Duplicate Handling and Selection Methods
- NEW: duplicated() - Identify duplicate rows with configurable subset and keep options
- NEW: dropDuplicates() - Remove duplicate rows from DataFrame
- NEW: nlargest() - Select N rows with largest values in specified column
- NEW: nsmallest() - Select N rows with smallest values in specified column
[FEATURE] Functional Programming Extensions
- NEW: apply() - Apply function along axis (rows or columns) with flexible operation support
- NEW: applymap() - Element-wise function application across entire DataFrame
- NEW: agg() - Aggregate with multiple functions simultaneously for complex aggregations
- NEW: transform() - Transform values while preserving DataFrame structure
- NEW: pipe() - Apply chainable functions for method composition
[MAJOR FEATURE] GroupBy Enhancements with Advanced Operations
- NEW: GroupBy class providing chainable API for grouped operations
- NEW: groupBy2() - Returns GroupBy object for method chaining and advanced groupby workflows
- NEW: Transform operations - transform(), transformMean(), transformSum() for group-wise transformations
- NEW: Filter operations - filter() method for group-wise filtering based on conditions
- NEW: Cumulative operations - cumsum(), cumprod(), cummax(), cummin() for cumulative calculations within groups
- NEW: Row selection - nth(), head(), tail() for selecting specific rows within groups
- NEW: NamedAgg class for named aggregations with multiple function support
- NEW: pipe() for method chaining and custom group operations
- ARCHITECTURE: Seamless integration with existing groupBy functionality
[MAJOR FEATURE] Advanced Time Series Operations (12 new methods)
- NEW: Shift operations - shift(), lag(), lead() for time series data alignment
- NEW: Time index operations - tshift() for shifting by time period, asfreq() for frequency conversion
- NEW: Time-based filtering - atTime(), betweenTime() for time window selection, first(), last() for period endpoints
- NEW: Timezone operations - tzLocalize() for adding timezone info, tzConvert() for timezone conversion, tzNaive() for removing timezone
- COMPATIBILITY: Pandas-like API for seamless time series workflows
[FEATURE] Enhanced Resampling Operations
- NEW: resampleOHLC() - Open, High, Low, Close resampling for OHLC data aggregation
- NEW: resampleNunique() - Count unique values per resampling period
- NEW: resampleWithOffset() - Resampling with custom time offset support
- ENHANCEMENT: Advanced time series data transformations with period-based aggregation
[FEATURE] Advanced Data Slicing Methods (6 new methods)
- NEW: slice() - Flexible slicing with step parameter support
- NEW: sliceByLabel() - Label-based range slicing for index-based selection
- NEW: sliceByPosition() - Combined position and range slicing operations
- NEW: sliceByLabelWithStep() - Label-based slicing with step increments
- NEW: everyNthRow() / everyNthColumn() - Convenience methods for sampling every nth element
- NEW: reverseRows() / reverseColumns() - Row and column reversal operations
[FEATURE] Expression Evaluation and Querying
- NEW: eval() - Evaluate string expressions for computed columns and values
- NEW: query() - Query DataFrame using intuitive string expressions with variable binding
- ENHANCEMENT: Chainable expression evaluation for complex data transformations
[FEATURE] MultiIndex and Advanced Indexing Support
- NEW: MultiIndex - Hierarchical indexing for multi-level row/column structures
- NEW: DatetimeIndex - Timezone-aware datetime indexing with frequency support
- NEW: TimedeltaIndex - Time difference indexing for duration-based operations
- NEW: PeriodIndex - Time period indexing for period-based time series
- ARCHITECTURE: Native support for multi-dimensional hierarchical data structures

0.8.2 #

[FIX] Code fix
[Fixed] Doc Strings.
[IMPROVEMENT] Improved dart format.

0.8.1 #

[FIX] Code fix

0.8.0 #

[MAJOR FEATURE] Enhanced File I/O Support with Web Compatibility
- NEW: Full CSV support using csv package - read/write with custom delimiters, headers, encoding
- NEW: Full Excel support using excel package- read/write .xlsx/.xls files with multi-sheet operations
- NEW: Multi-sheet Excel operations - readAllExcelSheets() and writeExcelSheets() for working with entire workbooks
- NEW: Platform-agnostic FileIO abstraction - works on desktop, mobile, and web without code changes
- NEW: Binary file support - readBytesFromFile() and writeBytesToFile() for Excel and other binary formats
- NEW: deleteFile() method added to FileIO interface for temporary file cleanup
- ENHANCEMENT: All file readers/writers now use FileIO for cross-platform compatibility
- ENHANCEMENT: DataFrame I/O methods now support both file paths and string content
  - DataFrame.fromCSV() supports both path and csv parameters with full DataFrame options (formatData, missingDataIndicator, replaceMissingValueWith, allowFlexibleColumns)
  - DataFrame.fromJson() supports both path and jsonString parameters with all orientations and DataFrame options
  - Automatic temporary file handling for string-based input with proper cleanup
- ENHANCEMENT: Unified toJSON() method combines in-memory conversion and file writing
  - Returns JSON structure when path is null (in-memory mode)
  - Writes to file when path is provided (file mode)
  - Supports all orientations: 'records', 'index', 'columns', 'values'
- WEB: Full web browser support - upload files for processing, download results
- Comprehensive documentation with examples for all file formats
[MAJOR FEATURE] HDF5 File Support - Pure Dart implementation
- NEW: Read HDF5 datasets with FileReader.readHDF5() - compatible with Python h5py, MATLAB v7.3, R
- NEW: All major datatypes supported (numeric, strings, compounds, arrays, enums, variable-length, timestamps)
- NEW: Multi-dimensional datasets (3D+) with automatic flattening and shape preservation
- NEW: Compression support (gzip, lzf, shuffle filter) and chunked storage with B-tree indexing
- NEW: Group navigation, attributes, metadata, and dataset slicing
- NEW: Cross-platform compatible (Windows, macOS, Linux, Web, iOS, Android) - no FFI dependencies
[BREAKING CHANGE] All functions with field name inputFilePath have been simplified to path.

0.7.0 #

[BREAKING CHANGE] Removed all geospatial features (GeoDataFrame, GeoSeries).
- REMOVAL: The GeoDataFrame and GeoSeries classes, along with all related spatial analysis methods, have been completely removed from the dartframe package.
- REASON: This change was made to streamline the core library, reduce its size, and separate concerns. Geospatial functionality is now housed in a dedicated, specialized package.
- MIGRATION: All geospatial features have been migrated to the new geoengine package. To continue using GeoDataFrame and GeoSeries, please add geoengine to your pubspec.yaml dependencies.
- You can find the new package here: geoengine on pub.dev.
- This move allows for more focused development on both the core data manipulation features in dartframe and the geospatial capabilities in geoengine.

0.6.3 #

[IMPROVEMENT] Improved dart format.

0.6.2 #

[IMPROVEMENT] Improved dart format.

0.6.1 #

[Fixed] Doc Strings.
[IMPROVEMENT] Improved dart format.

0.6.0 #

[FEATURE] Added comprehensive time series enhancements
- NEW: TimeSeriesIndex class for time-based indexing and operations
  - Support for timestamps with frequency information (Daily, Hourly, Monthly, Yearly)
  - Factory constructor TimeSeriesIndex.dateRange() for creating time series ranges
  - Automatic frequency detection with detectFrequency() method
  - Utility methods: slice(), contains(), indexOf(), asFreq() for frequency conversion
  - Support for empty time series with proper error handling
- NEW: FrequencyUtils class with time series utilities
  - Frequency normalization and validation (normalizeFrequency(), isValidFrequency())
  - Human-readable frequency descriptions (frequencyDescription())
  - Duration calculations for supported frequencies (getFrequencyDuration())
  - Support for frequency aliases (daily, hourly, monthly, yearly, annual)
- NEW: DataFrameTimeSeries extension for DataFrame time series operations
  - resample() method for changing time series frequency with aggregation functions
  - Support for multiple aggregation functions: mean, sum, min, max, count, first, last
  - upsample() method for increasing frequency with fill methods (pad/ffill, backfill/bfill, nearest)
  - downsample() method for decreasing frequency with aggregation
  - Automatic date column detection in DataFrames
  - Comprehensive error handling for edge cases and invalid inputs
- NEW: Comprehensive test coverage with 43 passing tests
  - Tests for TimeSeriesIndex constructor, properties, and utility methods
  - Tests for dateRange() factory constructor with various frequencies (D, H, M, Y)
  - Tests for frequency detection and validation functionality
  - Tests for DataFrame resampling, upsampling, and downsampling operations
  - Edge case testing for empty DataFrames, null dates, and mixed data types
  - Error handling validation for unsupported frequencies and methods
- ARCHITECTURE: Seamless integration with existing DataFrame and Series classes
- COMPATIBILITY: Pandas-like API for familiar time series operations
- PERFORMANCE: Efficient time-based operations with proper indexing
- All time series functionality follows pandas conventions for easy migration
[FEATURE] Added comprehensive categorical data support integrated directly into Series
- NEW: Series.astype('category') method for converting Series to categorical dtype (pandas-compatible)
- NEW: CategoricalAccessor (.cat) providing pandas-like categorical operations interface
- NEW: Efficient memory storage using integer codes with category labels mapping
- NEW: Support for both ordered and unordered categorical data types
- NEW: Category management operations:
  - series.cat.addCategories() - Add new categories to existing categorical data
  - series.cat.removeCategories() - Remove unused categories with validation
  - series.cat.renameCategories() - Rename categories using mapping dictionary
  - series.cat.reorderCategories() - Reorder categories and set ordered flag
- NEW: Categorical properties and methods:
  - series.cat.categories - Access category labels
  - series.cat.codes - Access integer codes
  - series.cat.ordered - Check if categories are ordered
  - series.cat.nCategories - Get number of categories
  - series.cat.unique() - Get unique categories present in data
  - series.cat.contains() - Check if categorical contains specific value
- NEW: series.isCategorical property to check if Series is categorical
- NEW: series.isCategoricalLike() method to detect categorical-suitable data
- NEW: series.seriesDtype property for pandas-like dtype information
- NEW: Enhanced series.dtype getter to handle categorical data types
- NEW: Seamless DataFrame integration - categorical Series work in all DataFrame operations
- NEW: Automatic data synchronization between categorical codes and Series values
- NEW: Type conversion support: 'category', 'object', 'int', 'float', 'string' dtypes
- ARCHITECTURE: Integrated approach - no separate CategoricalSeries class needed
- ARCHITECTURE: Internal _Categorical class for efficient categorical storage
- PERFORMANCE: Memory optimization - categorical encoding only when beneficial
- COMPATIBILITY: Full pandas API compatibility for categorical operations
- All existing Series methods (length, nunique, valueCounts, etc.) work seamlessly with categorical data
- Maintains backward compatibility - no breaking changes to existing Series functionality
[IMPROVEMENT] Improved performance optimizations
[FEATURE] Added comprehensive functions for missing data operations
- NEW: Complete test coverage for Series interpolation methods (linear, polynomial, spline)
- NEW: Extensive testing of enhanced fill operations (ffill, bfill) with limit parameters
- NEW: Missing data analysis accuracy validation using isna() and notna() methods
- NEW: Integration for combining interpolation, fill operations, and missing data detection
- NEW: Edge case testing for invalid parameters, insufficient data, and mixed data types
- NEW: Custom missing value handling across different scenarios (-999, 'NA', etc.)
- NEW: DataFrame-level missing data operations testing for multi-column scenarios
- Validates error handling, data preservation, and method parameter combinations
- Ensures robust behavior for complex missing data workflows and edge cases
[FEATURE] Added Shape class with multi-dimensional data structure support
- Supports both named access (shape.rows, shape.columns) and indexed access (shape[0], shape[1])
- Future-proofed for 3D+ data structures (tensors)
- Added utility methods: addDimension(), removeDimension(), transpose(), size, isEmpty, etc.
[FEATURE] Implemented comprehensive rolling window operations
- NEW: rollingWindow() method with pandas-like API for all columns simultaneously
- Basic operations: mean(), sum(), std(), variance(), min(), max()
- Advanced operations: median(), quantile(), skew(), kurt()
- Correlation operations: corr(), cov() with other DataFrames
- Custom functions: apply() method for user-defined operations
- Support for centered windows, minimum periods, and flexible parameters
[FEATURE] Implemented comprehensive statistical methods and consolidated statistical functions
- CONSOLIDATION: Moved all basic statistical functions from functions.dart to statistics.dart for better organization
  - Migrated count(), mean(), min(), max(), sum(), describe() with enhanced APIs
  - All functions now have consistent skipna parameter (defaults to true)
  - Improved error handling with ArgumentError instead of generic exceptions
  - Better missing value handling using internal _isMissing() method
  - Enhanced return types: dynamic for min/max/sum to handle missing values properly
- NEW: 19 additional statistical functions added to SeriesStatistics extension
- NEW: Basic Statistics Functions:
  - cumsum() - Cumulative sum over the Series with skipna support
  - nunique() - Count number of unique values with dropna parameter
  - value_counts() - Frequency count with normalize, sort, ascending, and dropna options
- NEW: Percentile Functions:
  - percentile() - Alternative to quantile using 0-100 scale for easier interpretation
  - iqr() - Interquartile Range (Q3 - Q1) for measuring statistical dispersion
- NEW: Advanced Statistics Functions:
  - sem() - Standard Error of the Mean with configurable degrees of freedom
  - mad() - Mean Absolute Deviation for robust central tendency measurement
  - range() - Range (max - min) for measuring data spread
- NEW: Correlation and Covariance Functions:
  - corr() - Pearson correlation coefficient with another Series
  - cov() - Covariance with another Series and configurable degrees of freedom
  - autocorr() - Autocorrelation with configurable lag periods for time series analysis
- NEW: Rank and Order Statistics Functions:
  - rank() - Rank values with 5 tie-breaking methods ('average', 'min', 'max', 'first', 'dense')
  - pct_change() - Percentage change between consecutive values for time series analysis
  - diff() - Difference between consecutive values with configurable periods
- NEW: Robust Statistics Functions:
  - trimmed_mean() - Mean after removing outliers from both tails (configurable proportion)
  - winsorized_mean() - Mean after capping outliers at boundary values
- NEW: Distribution Functions:
  - entropy() - Shannon entropy with configurable logarithm base for information theory
  - geometric_mean() - Geometric mean for positive values (nth root of product)
  - harmonic_mean() - Harmonic mean for positive values (reciprocal of arithmetic mean of reciprocals)
- ENHANCEMENT: All statistical functions feature:
  - Consistent API design with skipna parameters
  - Comprehensive documentation with mathematical explanations and examples
  - Proper edge case handling (empty data, insufficient samples, non-numeric data)
  - Return double.nan or _missingRepresentation for invalid cases instead of throwing exceptions
  - Full integration with existing Series functionality and DataFrame operations
- PERFORMANCE: Optimized mathematical operations using dart:math library functions
- COMPATIBILITY: Pandas-like API for familiar data science workflows
- ARCHITECTURE: Over 40 statistical functions now available in Series class
- Fixed broken describe() function that was calling undefined std() and quantile() methods
- Removed duplicate cumsum() function from functions.dart to avoid conflicts
- All existing tests pass with enhanced statistical functionality
[DEPRECATION] Deprecated single-column rolling() method in favor of rollingWindow()
- Added migration guide and compatibility documentation
- Maintained backward compatibility for existing code
- Enhanced performance for multiple rolling operations
[FEATURE] Added comprehensive data reshaping and manipulation extensions
- NEW: DataFrameReshaping extension with advanced reshaping operations
  - stack() and unstack() methods for hierarchical indexing and multi-level data structures
  - Enhanced meltEnhanced() method with additional pandas-like parameters (ignoreIndex, colLevel)
  - widen() method as intuitive inverse of melt for long-to-wide format conversion
  - Corrected transpose() method with proper matrix transposition logic
- NEW: Enhanced pivot table functionality
  - pivotTableEnhanced() with support for multiple index and column levels
  - crosstab() method for cross-tabulation analysis with margins and normalization
  - Advanced aggregation options and margin calculations
- NEW: DataFrameMerging extension with enhanced join operations
  - merge() method with pandas-like parameters and validation options
  - concat() method for concatenating DataFrames with flexible options
  - joinEnhanced() method with additional join strategies
  - Merge validation (one-to-one, one-to-many, many-to-one, many-to-many)
  - Framework for mergeAsof() (asynchronous merge operations)
[ARCHITECTURE] Maintained dual approach for reshaping methods
- functions.dart: Contains standard, fully functional methods (melt, pivotTable, join)
- reshaping.dart: Contains enhanced versions with "Enhanced" suffixes (meltEnhanced, pivotTableEnhanced, joinEnhanced)
- Users can choose between standard functionality or enhanced features
- No breaking changes - both approaches coexist for maximum compatibility
[FIX] Corrected transpose method implementation
- Fixed incorrect column structure that was using index values as column names
- Implemented proper matrix transposition where columns become rows and vice-versa
- Added proper copy logic with _copyValue() helper for different data types
- Enhanced documentation with clear examples of transpose behavior

0.5.5 #

[FEATURE] Changed to MIT License

0.5.4 #

[IMPROVEMENT] Cleaned code

0.5.3 #

[IMPROVEMENT] Cleaned code

0.5.2 #

[FIX] Fixed FileIO not exported

0.5.1 #

[IMPROVEMENT] Cleaned code
[IMPROVEMENT] Added documentation strings to functions and classes
[FIX] Moved all experiments to new branch
[IMPROVEMENT] Updated README and improved documentation
[FEATURE] Add quantile calculation with tests for edge cases

0.5.0 #

[IMPROVEMENT] Reorganize codebase into discrete libraries and update dependencies
[IMPROVEMENT] Restructure into separate library with parts
[IMPROVEMENT] Cleaned code base
[IMPROVEMENT] Refactor(utils): make functions public and remove main test
[IMPROVEMENT] Chore(dependencies): update intl and geoxml versions

0.4.5 #

[Fix] fix formatting and indentation in multiple files

0.4.4 #

[Fix] Refactor(geo_series): rename snake_case methods to camelCase for consistency

0.4.3 #

[Fix] Formatted files.
[IMPROVEMENT] Reorganize test files into dataframe_series directory
[IMPROVEMENT] Rename geoJSONToGEOS to GeoJSONToGEOS for consistency fix(geo_series)
[IMPROVEMENT] Specify Series return type for isEmpty getter chore(dartframe)
[IMPROVEMENT] Add dart:ffi import and include src/utils/lists.dart fix(series)
[IMPROVEMENT] Change default errors parameter to 'ignore' in toNumeric refactor(string_accessor)
[IMPROVEMENT] Improve split pattern handling with n parameter

0.4.2 #

[Fix] Formatted files.

0.4.1 #

[Fix] Fixed readme.

0.4.0 #

[FEATURE] Added main library file lib/dartframe.dart exporting library parts and initializing fileIO.
[FEATURE] Implemented DataFrameILocAccessor and DataFrameLocAccessor in lib/src/dart_frame/accessors.dart for integer and label-based data selection in DataFrames.
[FEATURE] Implemented core DataFrame class in lib/src/dart_frame/dart_frame.dart with constructors, data cleaning, accessors, and string representation.
[FEATURE] Added extensive DataFrame manipulation functions in lib/src/dart_frame/functions.dart (selection, filtering, sorting, stats, transformations, I/O, grouping).
[FEATURE] Implemented DataFrame operator overloads ([], []=) in lib/src/dart_frame/operations.dart.
[FEATURE] Added file I/O abstraction (FileIOBase) with platform-specific implementations in lib/src/file_helper/.
[FEATURE] Implemented core Series class in lib/src/series/ for 1D arrays, with constructors, operators, statistical functions, string accessor, and date/time conversions.
[DOCS] Added markdown documentation for DataFrame, GeoDataFrame, GeoSeries, and Series classes in docs/.
[DOCS] Added example/example.dart demonstrating DataFrame and Series usage.
[DOCS] Added example/geodataframe_example.dart demonstrating GeoDataFrame usage.
[DOCS] Added example/geoseries.dart demonstrating GeoSeries usage.
[MISC] Added output.geojson example output file.
[TEST] Added a suite of unit tests in test/ covering DataFrame and Series functionalities.
[DOCS] Added bin/dartframe.dart as a simple executable example.

0.3.4 #

[Fixed] Doc Strings.
[IMPROVEMENT] Improved dart format.

0.3.3 #

[Fixed] Doc Strings.

0.3.2 #

[IMPROVEMENT] Migrated to WASM.

0.3.1 #

[Fix] Fixed readme.

0.3.0 #

[Fix] Fixed readme.
[FEATURE] Added more properties
[FIX] Fixed DataFrame constructor to create modifiable column lists, allowing column addition after initialization
[FIX] Updated Series toString() method to properly display custom indices
[FEATURE] Added GeoSeries class for spatial data analysis
[FEATURE] Added GeoSeries.fromXY() factory constructor to create point geometries from x, y coordinates
[FEATURE] Added GeoSeries.fromWKT() factory constructor to create geometries from WKT strings
[FEATURE] Added GeoSeries.fromFeatureCollection() factory constructor to create geometries from GeoJSON
[FEATURE] Added spatial analysis methods to GeoSeries:
- getCoordinates() - extracts coordinates as a DataFrame
- countCoordinates - counts coordinate pairs in each geometry
- countGeometries - counts geometries in multi-part geometries
- countInteriorRings - counts interior rings in polygonal geometries
- isClosed - checks if LineStrings are closed
- isEmpty - checks if geometries are empty
- isRing - checks if features are rings
- isValid - validates geometry structures
- hasZ - checks for 3D coordinates
- bounds - gets bounding boxes for geometries
- totalBounds - gets overall bounds of all geometries
- centroid - calculates centroids of geometries
- type - gets geometry types
- area - calculates areas of polygonal geometries
- lengths - calculates lengths of linear geometries
- isCCW - checks if rings are counterclockwise
- contains - checks spatial containment relationships
[IMPROVEMENT] Enhanced Series class to support custom indices similar to DataFrame
[IMPROVEMENT] Renamed DataFrame's rowHeader to index for consistency with pandas API
[IMPROVEMENT] Updated DataFrame constructor to accept index parameter

0.2.2 #

[Fix] Fixed readme.
[FEATURE] Added topics to the package/library

0.2.1 #

[Fix] Fixed readme.

0.2.0 #

[FEATURE] Added isEmpty and isNotEmpty properties to check if DataFrame has rows
[FEATURE] Added copy() method to create deep copies of DataFrames
[FEATURE] Added dimension properties: rowCount, columnCount, and shape
[FEATURE] Added dtypes property to get column data types
[FEATURE] Added hasColumn() method to check for column existence
[FEATURE] Added unique() method to get DataFrame with only unique rows
[FEATURE] Added unique() method to Series to get unique values
[FEATURE] Added resetIndex() method for reindexing after filtering
[FEATURE] Added conversion methods: toListOfMaps() and toMap()
[FEATURE] Added sample() method for randomly sampling rows
[FEATURE] Added applyToColumn() method for applying functions to column elements
[FEATURE] Added applyToRows() method for applying functions to each row
[FEATURE] Added corr() method for computing correlation coefficients
[FEATURE] Added bin() method for creating bins from continuous data
[FEATURE] Added toCsv() method for converting DataFrame to CSV string
[FEATURE] Added pivot() method for creating pivot tables
[FEATURE] Added melt() method for reshaping data from wide to long format
[FEATURE] Added join() method for combining DataFrames
[IMPROVEMENT] Enhanced fillna() method with strategies (mean, median, mode, forward, backward)
[FEATURE] Added dropna() method to remove rows or columns with missing values
[IMPROVEMENT] Improved replace() method with regex support and column targeting
[FEATURE] Added replaceInPlace() method for in-place value replacement
[FEATURE] Added astype() method to convert column data types
[FEATURE] Added round() method to round numeric values to specified precision
[FEATURE] Added rolling() method for computing rolling window calculations
[FEATURE] Added cumulative() method for cumulative calculations (sum, product, min, max)
[FEATURE] Added quantile() method to compute quantiles over a column
[FEATURE] Added rank() method to compute numerical rank along a column
[FEATURE] Added abs() method to Series for calculating absolute values
[FEATURE] Added copy() method to Series for creating copies
[FEATURE] Added cummax() method to Series for cumulative maximum calculations
[FEATURE] Added cummin() method to Series for cumulative minimum calculations
[FEATURE] Added cumprod() method to Series for cumulative product calculations
[IMPROVEMENT] Enhanced cumsum() method in Series with skipna parameter
[FEATURE] Added GeoDataFrame class for handling geospatial data
[Fix] Fixed the ability to modify individual elements in DataFrame using df['column'][index] = value syntax
[FIX] Improved row header display in toString() method to properly handle headers of varying lengths

0.1.3 #

[IMPROVEMENT] Fixed Readme not showing the right status
[FEATURE] Added unit tests
[FEATURE] Added row header names/index

0.1.2 #

Fixed description to match dart packaging

0.1.1 #

Fixed description.

0.1.0 #

Initial version.

dartframe 0.8.4 dartframe: ^0.8.4 copied to clipboard

Metadata