dartframe 0.8.4
dartframe: ^0.8.4 copied to clipboard
DartFrame is a Dart library inspired by Pandas, simplifying structured data handling (tables, CSVs, JSON) with function tools for filtering, transforming, and analysis.
0.8.4 #
-
[MAJOR FEATURE] Window Functions - Exponentially Weighted Moving (EWM) Operations
- NEW:
DataFrame.ewm()- Create exponentially weighted window with span, alpha, halflife, or com parameters - NEW:
ewm().mean()- Exponentially weighted moving average for smoothing time series data - NEW:
ewm().std()- Exponentially weighted moving standard deviation for volatility analysis - NEW:
ewm().var_()- Exponentially weighted moving variance for risk measurement - NEW:
ewm().corr()- Exponentially weighted moving correlation (pairwise and with other DataFrame) - NEW:
ewm().cov()- Exponentially weighted moving covariance (pairwise and with other DataFrame) - ENHANCEMENT: Support for adjustWeights and ignoreNA parameters for flexible weighting schemes
- COMPATIBILITY: Pandas-like API for familiar exponential smoothing workflows
- NEW:
-
[MAJOR FEATURE] Window Functions - Expanding Window Operations
- NEW:
DataFrame.expanding()- Create expanding window with minPeriods parameter - NEW:
expanding().mean()- Expanding mean (cumulative average) for running statistics - NEW:
expanding().sum()- Expanding sum (cumulative sum) for accumulation analysis - NEW:
expanding().std()- Expanding standard deviation for growing window volatility - NEW:
expanding().min()- Expanding minimum (running minimum) for tracking lowest values - NEW:
expanding().max()- Expanding maximum (running maximum) for tracking highest values - ENHANCEMENT: All expanding operations support minPeriods parameter for minimum observation requirements
- NEW:
-
[FEATURE] DataFrame Statistical Methods - Data Manipulation Operations
- NEW:
DataFrame.clip()- Trim values at input thresholds with lower/upper bounds for outlier control - NEW:
DataFrame.abs()- Compute absolute values for all numeric columns - NEW:
DataFrame.pctChange()- Calculate percentage change between consecutive rows for growth analysis - NEW:
DataFrame.diff()- Calculate first discrete difference between consecutive rows - NEW:
DataFrame.idxmax()- Return index labels of maximum values for each column - NEW:
DataFrame.idxmin()- Return index labels of minimum values for each column - NEW:
DataFrame.qcut()- Quantile-based discretization for specified columns into equal-sized bins - ENHANCEMENT: Enhanced
DataFrame.round()with parameter validation and error handling
- NEW:
-
[FEATURE] Series Statistical Methods
- NEW:
Series.clip()- Trim values at input thresholds with lower/upper bounds - FIX: Resolved duplicate
abs()method causing ambiguity errors in Series extensions - FIX: Fixed Series extension methods not working on
Series<dynamic>by proper type handling
- NEW:
-
[MAJOR FEATURE] GroupBy Enhancements - Advanced Aggregation and Operations
- NEW:
DataFrame.groupBy2()- Create GroupBy object for advanced operations with chainable API - NEW:
GroupBy.transform()- Transform values within groups while maintaining original DataFrame shape - NEW:
GroupBy.filter()- Filter entire groups based on conditions for group-level selection - NEW:
GroupBy.pipe()- Apply chainable functions for method chaining and custom operations - NEW:
GroupBy.nth()- Get nth row from each group (supports negative indexing) - NEW:
GroupBy.head()/GroupBy.tail()- Get first/last n rows from each group - NEW: Cumulative Operations within Groups:
GroupBy.cumsum()- Cumulative sum within each group for running totalsGroupBy.cumprod()- Cumulative product within each groupGroupBy.cummax()- Cumulative maximum within each groupGroupBy.cummin()- Cumulative minimum within each group
- NEW: Flexible Aggregation with Enhanced
GroupBy.agg():- Single function mode:
agg('sum')for simple aggregations - Multiple functions mode:
agg(['sum', 'mean', 'count'])for multiple statistics - Column-specific mode:
agg({'col1': 'sum', 'col2': ['mean', 'max']})for targeted aggregations - Named aggregations mode:
agg({'total': NamedAgg('amount', 'sum')})for custom column names
- Single function mode:
- NEW:
NamedAggclass for custom aggregation column names and multiple function support - NEW: Convenience aggregation methods:
sum(),mean(),count(),min(),max(),std(),var_(),first(),last() - NEW: Utility methods:
ngroupsproperty,size()method,groupsproperty for group inspection - FIX: Fixed list equality issue in
groupBy()when using multiple columns - now uses string representation for proper map key equality - ARCHITECTURE: Lazy evaluation in GroupBy operations for memory efficiency
- NEW:
-
[FEATURE] Advanced Slicing Methods (Previously Implemented)
- NEW:
DataFrame.slice()- Slice with step parameter (forward and reverse slicing) - NEW:
DataFrame.sliceByLabel()- Label-based range slicing (inclusive endpoints) - NEW:
DataFrame.sliceByPosition()- Combined position slicing with step parameter - NEW:
DataFrame.sliceByLabelWithStep()- Label + step combination for flexible slicing - NEW:
DataFrame.everyNthRow()/DataFrame.everyNthColumn()- Convenience sampling methods - NEW:
DataFrame.reverseRows()/DataFrame.reverseColumns()- Simple reversal operations
- NEW:
-
[ENHANCEMENT] Method Chaining Support
- All new operations return DataFrames/Series for seamless method chaining
- Example:
df.clip(lower: 0, upper: 100).abs().round(2)for fluent API usage - Consistent API design across all statistical and manipulation methods
-
[ENHANCEMENT] Null Value Handling
- Consistent null value handling across all new operations
- Null values are preserved appropriately in all transformations
- Graceful handling of edge cases (empty DataFrames, single rows, mixed types)
-
[ENHANCEMENT] Performance Optimizations
- Efficient O(n) and O(n*m) implementations for all new methods
- Lazy evaluation in GroupBy operations for reduced memory footprint
- Memory-efficient implementations suitable for large datasets (1000+ rows tested)
- Performance targets: < 1 second for typical operations
-
[ENHANCEMENT] Error Handling
- Comprehensive parameter validation with descriptive error messages
- Proper type checking and conversion for mixed data types
- Clear error messages for invalid operations and edge cases
-
[MAJOR FEATURE] Data Type System - Nullable Types and Type Management
- NEW: Comprehensive DType system with nullable integer, boolean, and string types
Int8DType,Int16DType,Int32DType,Int64DType- Nullable integer types with range validation (-128 to 127, -32768 to 32767, etc.)BooleanDType- Nullable boolean with flexible string parsing ('true', 'yes', '1', 'false', 'no', '0')StringDType- Nullable string with optional max length constraintsFloat32DType,Float64DType- Nullable float types with NaN handlingDateTimeDType- Nullable datetime with string and timestamp parsingObjectDType- Generic object type for mixed data
- NEW:
DTypeRegistry- Custom data type registration systemregister(name, constructor)- Register custom types with string namesget(name)- Retrieve registered types by name- Built-in type lookup with fallback to custom types
- Type validation and management
- NEW:
DTypesconvenience class - Easy type creationDTypes.int8(),DTypes.int16(),DTypes.int32(),DTypes.int64()- Integer type constructorsDTypes.float32(),DTypes.float64()- Float type constructorsDTypes.boolean(),DTypes.string(),DTypes.datetime()- Other type constructors- All types support nullable/non-nullable variants via
nullableparameter
- NEW:
DataFrame.dtypesDetailed- Automatic type inference- Detects optimal types based on data content and value ranges
- Chooses smallest integer type that fits the data range (Int8 for [-128, 127], etc.)
- Handles nullable vs non-nullable type detection
- Smart string parsing: infers numeric types from parsable strings (e.g., '123' → Int8DType)
- NEW:
DataFrame.astype()- Enhanced type conversion with categorical support- Convert columns to specific DType objects:
df.astype({'col': DTypes.int32()}) - Support for Map<String, DType> and Map<String, String> formats
- Error handling modes: 'raise', 'ignore', 'coerce' for flexible conversion
- CATEGORICAL SUPPORT:
df.astype({'col': 'category'})delegates to existing categorical system - Automatic fallback for categorical and other existing types
- Full compatibility with existing
astype()behavior
- Convert columns to specific DType objects:
- NEW:
DataFrame.inferDTypes()- Automatic type optimization- Infer and convert to optimal types automatically
- Downcast options: 'integer', 'float', 'all' for memory optimization
- Smart string-to-number inference for data cleaning
- Reduces memory usage by selecting smallest appropriate types
- NEW:
DataFrame.memoryUsageByDType()- Memory usage analysis- Calculate memory usage per column based on dtype
- Fixed-size type calculations (Int8=1 byte, Int16=2 bytes, etc.)
- Variable-size type estimation for strings and objects
- Returns Map<String, int> of column names to bytes
- NEW:
Series.dtypeInfo- Series type information- Get DType information for Series data
- Automatic type inference with range-based integer selection
- Avoids conflict with existing
dtypeproperty
- NEW:
Series.astype()- Series type conversion with categorical support- Convert Series to specific DType:
s.astype(DTypes.int8()) - CATEGORICAL SUPPORT:
s.astype('category', categories: [...], ordered: true) - In-place conversion for categorical type (modifies original Series)
- Returns new Series for other type conversions
- Support for 'int', 'float', 'string', 'object' string names
- Error handling: 'raise', 'ignore', 'coerce' modes
- Full compatibility with existing categorical system
- Convert Series to specific DType:
- NEW:
Series.memoryUsageByDType()- Series memory analysis- Calculate memory usage based on inferred dtype
- Range-based integer type detection for accurate estimates
- NEW: Series public methods for dtype management
toCategorical(categories, ordered)- Convert to categorical typesetDType(dtype)- Set dtype string identifierclearCategorical()- Clear categorical data
- ENHANCEMENT: Smart Type Inference
- Parses string content to infer numeric types:
['1', '2', '3']→ Int8DType - Range-based integer type selection: values [1,2,3] → Int8, [1000,2000] → Int16
- Automatic detection of parsable integers and floats in string data
- Parses string content to infer numeric types:
- ENHANCEMENT: Robust Error Handling
- All numeric and datetime types throw
FormatExceptionfor unparsable strings - Proper exception propagation with 'raise' error mode
- Null coercion with 'coerce' error mode
- Value preservation with 'ignore' error mode
- All numeric and datetime types throw
- COMPATIBILITY: Works alongside existing type system
- Categorical conversion delegates to existing
_Categoricalimplementation - No breaking changes to existing DataFrame/Series behavior
- Extension-based implementation for clean separation of concerns
- Categorical conversion delegates to existing
- PERFORMANCE: Memory-efficient type storage and conversion
- Optimal type selection reduces memory footprint
- Efficient conversion algorithms
- Lazy evaluation where appropriate
- VALIDATION: Comprehensive type validation and range checking
- Integer range validation (Int8: -128 to 127, Int16: -32768 to 32767, etc.)
- Type compatibility checks
- Null handling for nullable types
- TESTING: 52 comprehensive tests (22 dtype + 30 categorical)
- Full integration testing with DataFrame and Series
- Categorical compatibility testing
- Error handling and edge case coverage
- Memory usage calculation verification
- NEW: Comprehensive DType system with nullable integer, boolean, and string types
-
[MAJOR FEATURE] Database Support - SQL Database Integration
- NEW:
DatabaseConnectionabstract interface for database operationsquery()- Execute SQL queries and return DataFrameexecute()- Execute SQL commands (INSERT, UPDATE, DELETE) and return affected rowsexecuteBatch()- Execute multiple SQL commands efficientlybeginTransaction()- Start database transactions for ACID complianceclose()- Close database connectionisConnected()- Check connection statusdatabaseType- Get database type identifier
- NEW:
DatabaseTransactioninterface for transaction managementquery()- Execute queries within transactionexecute()- Execute commands within transactioncommit()- Commit transaction changesrollback()- Rollback transaction on errors
- NEW:
ConnectionPoolclass for efficient connection managementgetConnection()- Get connection from poolreleaseConnection()- Return connection to poolclose()- Close all pooled connectionsactiveConnectionCount- Monitor active connectionsavailableConnectionCount- Monitor available connections- Configurable max connections (default: 5)
- Automatic connection lifecycle management
- NEW: Database-specific implementations
SQLiteConnection- SQLite database supportPostgreSQLConnection- PostgreSQL database supportMySQLConnection- MySQL database support- Each with transaction support and batch operations
- NEW:
DatabaseReaderutility classreadSqlQuery()- Read SQL query results into DataFrame (pandas-like read_sql_query)readSqlTable()- Read entire SQL table into DataFrame (pandas-like read_sql_table)createConnection()- Factory method for creating database connections- Support for WHERE clauses, LIMIT, OFFSET, column selection
- NEW:
DataFrame.toSql()extension method (pandas-like to_sql)- Write DataFrame to SQL database tables
ifExistsmodes: 'fail', 'replace', 'append' for table handling- Automatic table creation with type inference
- Custom data type mapping via
dtypeparameter - Chunked inserts for large datasets (configurable
chunkSize) - Index column support with custom labels
- Automatic SQL type inference from Dart types
- FEATURE: Parameterized Queries
- All query methods support
parametersargument - SQL injection prevention through parameter binding
- Type-safe parameter handling
- All query methods support
- FEATURE: Batch Operations
executeBatch()for bulk inserts/updates- Optimized for high-performance bulk operations
- Reduces database round-trips
- FEATURE: Transaction Support
- Full ACID compliance with begin/commit/rollback
- Automatic rollback on errors
- Nested transaction prevention
- Transaction state management
- ARCHITECTURE: Production-ready structure
- Abstract interfaces for easy extension
- Mock implementations for testing
- Ready for real database driver integration (sqflite, postgres, mysql1)
- Separation of concerns with transaction classes
- ERROR HANDLING: Comprehensive exception types
DatabaseConnectionError- Connection failuresDatabaseQueryError- Query execution failuresDatabaseTransactionError- Transaction failuresUnsupportedDatabaseError- Unsupported database types
- EXAMPLES: Complete example files
example/database_example.dart- Mock examples with 10 scenariosexample/database_real_example.dart- Real database examples (SQLite Northwind, PostgreSQL PostGIS)DATABASE_SETUP.md- Comprehensive setup guide
- COMPATIBILITY: Pandas-like API for familiar usage patterns
- NEW:
-
[DOCUMENTATION] Comprehensive Documentation Added
WINDOW_FUNCTIONS_SUMMARY.md- Complete window functions documentation with examplesEWM_CORR_COV_SUMMARY.md- EWM correlation and covariance documentationGROUPBY_ENHANCEMENTS_SUMMARY.md- GroupBy operations documentation with use casesADVANCED_SLICING_SUMMARY.md- Advanced slicing documentationCLIP_ABS_SUMMARY.md- DataFrame operations documentationSERIES_CLIP_SUMMARY.md- Series clip documentationDATAFRAME_COMPLETE_METHODS_SUMMARY.md- Complete statistical methods documentationexample/window_functions_example.dart- Window functions usage examplesexample/dataframe_operations_example.dart- DataFrame operations examplesexample/series_clip_example.dart- Series clip usage examplesexample/dtype_example.dart- Comprehensive DType system examples with 10 usage scenarios
-
[TESTING] Comprehensive Test Coverage - 221+ New Tests Added
- 47 tests for window functions (
test/window_functions_test.dart) - 47 tests for GroupBy operations (
test/groupby_enhanced_test.dart) - 44 tests for advanced slicing (
test/advanced_slicing_test.dart) - 36 tests for DataFrame operations (
test/dataframe_operations_test.dart) - 28 tests for missing DataFrame methods (
test/dataframe_missing_methods_test.dart) - 19 tests for Series clip (
test/series_clip_test.dart) - Edge case coverage: empty DataFrames/Series, single row/column, mixed types, null values
- Large dataset performance validation (1000+ rows)
- 47 tests for window functions (
-
[FEATURE] Export Formats - Multiple Output Formats
- NEW:
toLatex()- Export DataFrame to LaTeX table format- Support for captions, labels, and position specifiers
- Automatic escaping of special LaTeX characters
- Longtable environment for multi-page tables
- Custom column format strings
- Bold headers and configurable styling
- NEW:
toMarkdown()- Export DataFrame to Markdown table format- GitHub-flavored markdown (pipe format)
- Grid and simple table formats
- Column alignment options (left, center, right)
- Float formatting for numeric precision
- Maximum column width for truncation
- NEW:
toStringFormatted()- Enhanced formatted string representation- Intelligent truncation for large DataFrames
- Configurable max rows and columns
- Float formatting support
- Shape information footer
- Pandas-like display with ellipsis
- NEW:
toRecords()- Convert DataFrame to list of record maps- Optional index inclusion
- Custom index column naming
- Perfect for JSON serialization
- Row-by-row iteration support
- COMPATIBILITY: Pandas-like API for familiar export workflows
- NEW:
-
[FEATURE] Web & API - HTML and XML Support
- NEW:
toHtml()- Export DataFrame to HTML table format- CSS classes and table ID support
- Notebook styling for Jupyter-like display
- Configurable borders and alignment
- Automatic HTML entity escaping
- Truncation for large DataFrames
- Dimension display footer
- NEW:
toXml()- Export DataFrame to XML format- Custom root and row element names
- Attribute and element column modes
- XML entity escaping
- Pretty print with indentation
- Index inclusion control
- NEW:
DataFrame.readHtml()- Read HTML tables from string- Automatic table detection and parsing
- Header row specification
- Numeric value parsing
- HTML entity decoding
- Multiple table support
- NEW:
DataFrame.readXml()- Read XML data into DataFrame- Custom row element selection
- Attribute extraction with prefix
- Numeric value parsing
- XML entity decoding
- Flexible column detection
- COMPATIBILITY: Round-trip support for HTML and XML formats
- NEW:
-
[API COMPLETENESS] Feature Parity Achievement
- DataFrame and Series now have complete statistical method coverage
- All 11 statistical methods implemented for both DataFrame and Series
- Pandas-like API with consistent method signatures across all operations
- Full support for method chaining and functional programming patterns
- Complete export format support for documentation and reporting
0.8.3 #
-
[MAJOR FEATURE] Comprehensive String Operations Extension
- NEW: Pattern extraction methods -
str.extract(),str.extractall(),str.findall()for regex-based text processing - NEW: String padding and justification -
str.pad(),str.center(),str.ljust(),str.rjust(),str.zfill()for text alignment - NEW: String slicing and manipulation -
str.slice(),str.get()for advanced substring operations - NEW: String concatenation and repetition -
str.cat(),str.repeat()for text composition - NEW: String type checking methods -
str.isalnum(),str.isalpha(),str.isdigit(),str.isspace(),str.islower(),str.isupper(),str.istitle(),str.isnumeric(),str.isdecimal()for character validation - COMPATIBILITY: Pandas-like string accessor API for familiar text operations
- NEW: Pattern extraction methods -
-
[MAJOR FEATURE] Enhanced Categorical Data Operations
- NEW:
cat.reorderCategories()- Reorder category levels with ordering control - NEW:
cat.addCategories()- Add new categories to existing categorical data - NEW:
cat.removeCategories()- Remove unused categories with validation - NEW:
cat.renameCategories()- Rename categories using mapping dictionaries - NEW:
cat.setCategories()- Set categories with recode and rename modes - NEW:
cat.asOrdered()/cat.asUnordered()- Convert between ordered and unordered categorical types - NEW:
cat.min()/cat.max()- Min/max operations for ordered categories - NEW:
cat.memoryUsage()- Memory usage analysis and optimization metrics for categorical storage - ENHANCEMENT: All categorical operations integrated with CategoricalAccessor interface
- NEW:
-
[FEATURE] DataFrame Duplicate Handling and Selection Methods
- NEW:
duplicated()- Identify duplicate rows with configurable subset and keep options - NEW:
dropDuplicates()- Remove duplicate rows from DataFrame - NEW:
nlargest()- Select N rows with largest values in specified column - NEW:
nsmallest()- Select N rows with smallest values in specified column
- NEW:
-
[FEATURE] Functional Programming Extensions
- NEW:
apply()- Apply function along axis (rows or columns) with flexible operation support - NEW:
applymap()- Element-wise function application across entire DataFrame - NEW:
agg()- Aggregate with multiple functions simultaneously for complex aggregations - NEW:
transform()- Transform values while preserving DataFrame structure - NEW:
pipe()- Apply chainable functions for method composition
- NEW:
-
[MAJOR FEATURE] GroupBy Enhancements with Advanced Operations
- NEW:
GroupByclass providing chainable API for grouped operations - NEW:
groupBy2()- Returns GroupBy object for method chaining and advanced groupby workflows - NEW: Transform operations -
transform(),transformMean(),transformSum()for group-wise transformations - NEW: Filter operations -
filter()method for group-wise filtering based on conditions - NEW: Cumulative operations -
cumsum(),cumprod(),cummax(),cummin()for cumulative calculations within groups - NEW: Row selection -
nth(),head(),tail()for selecting specific rows within groups - NEW:
NamedAggclass for named aggregations with multiple function support - NEW:
pipe()for method chaining and custom group operations - ARCHITECTURE: Seamless integration with existing groupBy functionality
- NEW:
-
[MAJOR FEATURE] Advanced Time Series Operations (12 new methods)
- NEW: Shift operations -
shift(),lag(),lead()for time series data alignment - NEW: Time index operations -
tshift()for shifting by time period,asfreq()for frequency conversion - NEW: Time-based filtering -
atTime(),betweenTime()for time window selection,first(),last()for period endpoints - NEW: Timezone operations -
tzLocalize()for adding timezone info,tzConvert()for timezone conversion,tzNaive()for removing timezone - COMPATIBILITY: Pandas-like API for seamless time series workflows
- NEW: Shift operations -
-
[FEATURE] Enhanced Resampling Operations
- NEW:
resampleOHLC()- Open, High, Low, Close resampling for OHLC data aggregation - NEW:
resampleNunique()- Count unique values per resampling period - NEW:
resampleWithOffset()- Resampling with custom time offset support - ENHANCEMENT: Advanced time series data transformations with period-based aggregation
- NEW:
-
[FEATURE] Advanced Data Slicing Methods (6 new methods)
- NEW:
slice()- Flexible slicing with step parameter support - NEW:
sliceByLabel()- Label-based range slicing for index-based selection - NEW:
sliceByPosition()- Combined position and range slicing operations - NEW:
sliceByLabelWithStep()- Label-based slicing with step increments - NEW:
everyNthRow()/everyNthColumn()- Convenience methods for sampling every nth element - NEW:
reverseRows()/reverseColumns()- Row and column reversal operations
- NEW:
-
[FEATURE] Expression Evaluation and Querying
- NEW:
eval()- Evaluate string expressions for computed columns and values - NEW:
query()- Query DataFrame using intuitive string expressions with variable binding - ENHANCEMENT: Chainable expression evaluation for complex data transformations
- NEW:
-
[FEATURE] MultiIndex and Advanced Indexing Support
- NEW:
MultiIndex- Hierarchical indexing for multi-level row/column structures - NEW:
DatetimeIndex- Timezone-aware datetime indexing with frequency support - NEW:
TimedeltaIndex- Time difference indexing for duration-based operations - NEW:
PeriodIndex- Time period indexing for period-based time series - ARCHITECTURE: Native support for multi-dimensional hierarchical data structures
- NEW:
0.8.2 #
- [FIX] Code fix
- [Fixed] Doc Strings.
- [IMPROVEMENT] Improved dart format.
0.8.1 #
- [FIX] Code fix
0.8.0 #
-
[MAJOR FEATURE] Enhanced File I/O Support with Web Compatibility
- NEW: Full CSV support using
csvpackage - read/write with custom delimiters, headers, encoding - NEW: Full Excel support using
excelpackage- read/write .xlsx/.xls files with multi-sheet operations - NEW: Multi-sheet Excel operations -
readAllExcelSheets()andwriteExcelSheets()for working with entire workbooks - NEW: Platform-agnostic FileIO abstraction - works on desktop, mobile, and web without code changes
- NEW: Binary file support -
readBytesFromFile()andwriteBytesToFile()for Excel and other binary formats - NEW:
deleteFile()method added to FileIO interface for temporary file cleanup - ENHANCEMENT: All file readers/writers now use FileIO for cross-platform compatibility
- ENHANCEMENT: DataFrame I/O methods now support both file paths and string content
DataFrame.fromCSV()supports bothpathandcsvparameters with full DataFrame options (formatData, missingDataIndicator, replaceMissingValueWith, allowFlexibleColumns)DataFrame.fromJson()supports bothpathandjsonStringparameters with all orientations and DataFrame options- Automatic temporary file handling for string-based input with proper cleanup
- ENHANCEMENT: Unified
toJSON()method combines in-memory conversion and file writing- Returns JSON structure when
pathis null (in-memory mode) - Writes to file when
pathis provided (file mode) - Supports all orientations: 'records', 'index', 'columns', 'values'
- Returns JSON structure when
- WEB: Full web browser support - upload files for processing, download results
- Comprehensive documentation with examples for all file formats
- NEW: Full CSV support using
-
[MAJOR FEATURE] HDF5 File Support - Pure Dart implementation
- NEW: Read HDF5 datasets with
FileReader.readHDF5()- compatible with Python h5py, MATLAB v7.3, R - NEW: All major datatypes supported (numeric, strings, compounds, arrays, enums, variable-length, timestamps)
- NEW: Multi-dimensional datasets (3D+) with automatic flattening and shape preservation
- NEW: Compression support (gzip, lzf, shuffle filter) and chunked storage with B-tree indexing
- NEW: Group navigation, attributes, metadata, and dataset slicing
- NEW: Cross-platform compatible (Windows, macOS, Linux, Web, iOS, Android) - no FFI dependencies
- NEW: Read HDF5 datasets with
-
[BREAKING CHANGE] All functions with field name
inputFilePathhave been simplified topath.
0.7.0 #
- [BREAKING CHANGE] Removed all geospatial features (
GeoDataFrame,GeoSeries).- REMOVAL: The
GeoDataFrameandGeoSeriesclasses, along with all related spatial analysis methods, have been completely removed from thedartframepackage. - REASON: This change was made to streamline the core library, reduce its size, and separate concerns. Geospatial functionality is now housed in a dedicated, specialized package.
- MIGRATION: All geospatial features have been migrated to the new
geoenginepackage. To continue usingGeoDataFrameandGeoSeries, please addgeoengineto yourpubspec.yamldependencies. - You can find the new package here: geoengine on pub.dev.
- This move allows for more focused development on both the core data manipulation features in
dartframeand the geospatial capabilities ingeoengine.
- REMOVAL: The
0.6.3 #
- [IMPROVEMENT] Improved dart format.
0.6.2 #
- [IMPROVEMENT] Improved dart format.
0.6.1 #
- [Fixed] Doc Strings.
- [IMPROVEMENT] Improved dart format.
0.6.0 #
- [FEATURE] Added comprehensive time series enhancements
- NEW:
TimeSeriesIndexclass for time-based indexing and operations- Support for timestamps with frequency information (Daily, Hourly, Monthly, Yearly)
- Factory constructor
TimeSeriesIndex.dateRange()for creating time series ranges - Automatic frequency detection with
detectFrequency()method - Utility methods:
slice(),contains(),indexOf(),asFreq()for frequency conversion - Support for empty time series with proper error handling
- NEW:
FrequencyUtilsclass with time series utilities- Frequency normalization and validation (
normalizeFrequency(),isValidFrequency()) - Human-readable frequency descriptions (
frequencyDescription()) - Duration calculations for supported frequencies (
getFrequencyDuration()) - Support for frequency aliases (daily, hourly, monthly, yearly, annual)
- Frequency normalization and validation (
- NEW:
DataFrameTimeSeriesextension for DataFrame time series operationsresample()method for changing time series frequency with aggregation functions- Support for multiple aggregation functions: mean, sum, min, max, count, first, last
upsample()method for increasing frequency with fill methods (pad/ffill, backfill/bfill, nearest)downsample()method for decreasing frequency with aggregation- Automatic date column detection in DataFrames
- Comprehensive error handling for edge cases and invalid inputs
- NEW: Comprehensive test coverage with 43 passing tests
- Tests for
TimeSeriesIndexconstructor, properties, and utility methods - Tests for
dateRange()factory constructor with various frequencies (D, H, M, Y) - Tests for frequency detection and validation functionality
- Tests for DataFrame resampling, upsampling, and downsampling operations
- Edge case testing for empty DataFrames, null dates, and mixed data types
- Error handling validation for unsupported frequencies and methods
- Tests for
- ARCHITECTURE: Seamless integration with existing DataFrame and Series classes
- COMPATIBILITY: Pandas-like API for familiar time series operations
- PERFORMANCE: Efficient time-based operations with proper indexing
- All time series functionality follows pandas conventions for easy migration
- NEW:
- [FEATURE] Added comprehensive categorical data support integrated directly into Series
- NEW:
Series.astype('category')method for converting Series to categorical dtype (pandas-compatible) - NEW:
CategoricalAccessor(.cat) providing pandas-like categorical operations interface - NEW: Efficient memory storage using integer codes with category labels mapping
- NEW: Support for both ordered and unordered categorical data types
- NEW: Category management operations:
series.cat.addCategories()- Add new categories to existing categorical dataseries.cat.removeCategories()- Remove unused categories with validationseries.cat.renameCategories()- Rename categories using mapping dictionaryseries.cat.reorderCategories()- Reorder categories and set ordered flag
- NEW: Categorical properties and methods:
series.cat.categories- Access category labelsseries.cat.codes- Access integer codesseries.cat.ordered- Check if categories are orderedseries.cat.nCategories- Get number of categoriesseries.cat.unique()- Get unique categories present in dataseries.cat.contains()- Check if categorical contains specific value
- NEW:
series.isCategoricalproperty to check if Series is categorical - NEW:
series.isCategoricalLike()method to detect categorical-suitable data - NEW:
series.seriesDtypeproperty for pandas-like dtype information - NEW: Enhanced
series.dtypegetter to handle categorical data types - NEW: Seamless DataFrame integration - categorical Series work in all DataFrame operations
- NEW: Automatic data synchronization between categorical codes and Series values
- NEW: Type conversion support: 'category', 'object', 'int', 'float', 'string' dtypes
- ARCHITECTURE: Integrated approach - no separate CategoricalSeries class needed
- ARCHITECTURE: Internal
_Categoricalclass for efficient categorical storage - PERFORMANCE: Memory optimization - categorical encoding only when beneficial
- COMPATIBILITY: Full pandas API compatibility for categorical operations
- All existing Series methods (length, nunique, valueCounts, etc.) work seamlessly with categorical data
- Maintains backward compatibility - no breaking changes to existing Series functionality
- NEW:
- [IMPROVEMENT] Improved performance optimizations
- [FEATURE] Added comprehensive functions for missing data operations
- NEW: Complete test coverage for Series interpolation methods (linear, polynomial, spline)
- NEW: Extensive testing of enhanced fill operations (ffill, bfill) with limit parameters
- NEW: Missing data analysis accuracy validation using isna() and notna() methods
- NEW: Integration for combining interpolation, fill operations, and missing data detection
- NEW: Edge case testing for invalid parameters, insufficient data, and mixed data types
- NEW: Custom missing value handling across different scenarios (-999, 'NA', etc.)
- NEW: DataFrame-level missing data operations testing for multi-column scenarios
- Validates error handling, data preservation, and method parameter combinations
- Ensures robust behavior for complex missing data workflows and edge cases
- [FEATURE] Added Shape class with multi-dimensional data structure support
- Supports both named access (
shape.rows,shape.columns) and indexed access (shape[0],shape[1]) - Future-proofed for 3D+ data structures (tensors)
- Added utility methods:
addDimension(),removeDimension(),transpose(),size,isEmpty, etc.
- Supports both named access (
- [FEATURE] Implemented comprehensive rolling window operations
- NEW:
rollingWindow()method with pandas-like API for all columns simultaneously - Basic operations:
mean(),sum(),std(),variance(),min(),max() - Advanced operations:
median(),quantile(),skew(),kurt() - Correlation operations:
corr(),cov()with other DataFrames - Custom functions:
apply()method for user-defined operations - Support for centered windows, minimum periods, and flexible parameters
- NEW:
- [FEATURE] Implemented comprehensive statistical methods and consolidated statistical functions
- CONSOLIDATION: Moved all basic statistical functions from
functions.darttostatistics.dartfor better organization- Migrated
count(),mean(),min(),max(),sum(),describe()with enhanced APIs - All functions now have consistent
skipnaparameter (defaults totrue) - Improved error handling with
ArgumentErrorinstead of generic exceptions - Better missing value handling using internal
_isMissing()method - Enhanced return types:
dynamicfor min/max/sum to handle missing values properly
- Migrated
- NEW: 19 additional statistical functions added to SeriesStatistics extension
- NEW: Basic Statistics Functions:
cumsum()- Cumulative sum over the Series with skipna supportnunique()- Count number of unique values with dropna parametervalue_counts()- Frequency count with normalize, sort, ascending, and dropna options
- NEW: Percentile Functions:
percentile()- Alternative to quantile using 0-100 scale for easier interpretationiqr()- Interquartile Range (Q3 - Q1) for measuring statistical dispersion
- NEW: Advanced Statistics Functions:
sem()- Standard Error of the Mean with configurable degrees of freedommad()- Mean Absolute Deviation for robust central tendency measurementrange()- Range (max - min) for measuring data spread
- NEW: Correlation and Covariance Functions:
corr()- Pearson correlation coefficient with another Seriescov()- Covariance with another Series and configurable degrees of freedomautocorr()- Autocorrelation with configurable lag periods for time series analysis
- NEW: Rank and Order Statistics Functions:
rank()- Rank values with 5 tie-breaking methods ('average', 'min', 'max', 'first', 'dense')pct_change()- Percentage change between consecutive values for time series analysisdiff()- Difference between consecutive values with configurable periods
- NEW: Robust Statistics Functions:
trimmed_mean()- Mean after removing outliers from both tails (configurable proportion)winsorized_mean()- Mean after capping outliers at boundary values
- NEW: Distribution Functions:
entropy()- Shannon entropy with configurable logarithm base for information theorygeometric_mean()- Geometric mean for positive values (nth root of product)harmonic_mean()- Harmonic mean for positive values (reciprocal of arithmetic mean of reciprocals)
- ENHANCEMENT: All statistical functions feature:
- Consistent API design with
skipnaparameters - Comprehensive documentation with mathematical explanations and examples
- Proper edge case handling (empty data, insufficient samples, non-numeric data)
- Return
double.nanor_missingRepresentationfor invalid cases instead of throwing exceptions - Full integration with existing Series functionality and DataFrame operations
- Consistent API design with
- PERFORMANCE: Optimized mathematical operations using dart:math library functions
- COMPATIBILITY: Pandas-like API for familiar data science workflows
- ARCHITECTURE: Over 40 statistical functions now available in Series class
- Fixed broken
describe()function that was calling undefinedstd()andquantile()methods - Removed duplicate
cumsum()function fromfunctions.dartto avoid conflicts - All existing tests pass with enhanced statistical functionality
- CONSOLIDATION: Moved all basic statistical functions from
- [DEPRECATION] Deprecated single-column
rolling()method in favor ofrollingWindow()- Added migration guide and compatibility documentation
- Maintained backward compatibility for existing code
- Enhanced performance for multiple rolling operations
- [FEATURE] Added comprehensive data reshaping and manipulation extensions
- NEW:
DataFrameReshapingextension with advanced reshaping operationsstack()andunstack()methods for hierarchical indexing and multi-level data structures- Enhanced
meltEnhanced()method with additional pandas-like parameters (ignoreIndex, colLevel) widen()method as intuitive inverse of melt for long-to-wide format conversion- Corrected
transpose()method with proper matrix transposition logic
- NEW: Enhanced pivot table functionality
pivotTableEnhanced()with support for multiple index and column levelscrosstab()method for cross-tabulation analysis with margins and normalization- Advanced aggregation options and margin calculations
- NEW:
DataFrameMergingextension with enhanced join operationsmerge()method with pandas-like parameters and validation optionsconcat()method for concatenating DataFrames with flexible optionsjoinEnhanced()method with additional join strategies- Merge validation (one-to-one, one-to-many, many-to-one, many-to-many)
- Framework for
mergeAsof()(asynchronous merge operations)
- NEW:
- [ARCHITECTURE] Maintained dual approach for reshaping methods
functions.dart: Contains standard, fully functional methods (melt,pivotTable,join)reshaping.dart: Contains enhanced versions with "Enhanced" suffixes (meltEnhanced,pivotTableEnhanced,joinEnhanced)- Users can choose between standard functionality or enhanced features
- No breaking changes - both approaches coexist for maximum compatibility
- [FIX] Corrected transpose method implementation
- Fixed incorrect column structure that was using index values as column names
- Implemented proper matrix transposition where columns become rows and vice-versa
- Added proper copy logic with
_copyValue()helper for different data types - Enhanced documentation with clear examples of transpose behavior
0.5.5 #
- [FEATURE] Changed to MIT License
0.5.4 #
- [IMPROVEMENT] Cleaned code
0.5.3 #
- [IMPROVEMENT] Cleaned code
0.5.2 #
- [FIX] Fixed FileIO not exported
0.5.1 #
- [IMPROVEMENT] Cleaned code
- [IMPROVEMENT] Added documentation strings to functions and classes
- [FIX] Moved all experiments to new branch
- [IMPROVEMENT] Updated README and improved documentation
- [FEATURE] Add quantile calculation with tests for edge cases
0.5.0 #
- [IMPROVEMENT] Reorganize codebase into discrete libraries and update dependencies
- [IMPROVEMENT] Restructure into separate library with parts
- [IMPROVEMENT] Cleaned code base
- [IMPROVEMENT] Refactor(utils): make functions public and remove main test
- [IMPROVEMENT] Chore(dependencies): update intl and geoxml versions
0.4.5 #
- [Fix] fix formatting and indentation in multiple files
0.4.4 #
- [Fix] Refactor(geo_series): rename snake_case methods to camelCase for consistency
0.4.3 #
- [Fix] Formatted files.
- [IMPROVEMENT] Reorganize test files into dataframe_series directory
- [IMPROVEMENT] Rename geoJSONToGEOS to GeoJSONToGEOS for consistency fix(geo_series)
- [IMPROVEMENT] Specify Series return type for isEmpty getter chore(dartframe)
- [IMPROVEMENT] Add dart:ffi import and include src/utils/lists.dart fix(series)
- [IMPROVEMENT] Change default errors parameter to 'ignore' in toNumeric refactor(string_accessor)
- [IMPROVEMENT] Improve split pattern handling with n parameter
0.4.2 #
- [Fix] Formatted files.
0.4.1 #
- [Fix] Fixed readme.
0.4.0 #
- [FEATURE] Added main library file lib/dartframe.dart exporting library parts and initializing fileIO.
- [FEATURE] Implemented DataFrameILocAccessor and DataFrameLocAccessor in lib/src/dart_frame/accessors.dart for integer and label-based data selection in DataFrames.
- [FEATURE] Implemented core DataFrame class in lib/src/dart_frame/dart_frame.dart with constructors, data cleaning, accessors, and string representation.
- [FEATURE] Added extensive DataFrame manipulation functions in lib/src/dart_frame/functions.dart (selection, filtering, sorting, stats, transformations, I/O, grouping).
- [FEATURE] Implemented DataFrame operator overloads ([], []=) in lib/src/dart_frame/operations.dart.
- [FEATURE] Added file I/O abstraction (FileIOBase) with platform-specific implementations in lib/src/file_helper/.
- [FEATURE] Implemented core Series class in lib/src/series/ for 1D arrays, with constructors, operators, statistical functions, string accessor, and date/time conversions.
- [DOCS] Added markdown documentation for DataFrame, GeoDataFrame, GeoSeries, and Series classes in docs/.
- [DOCS] Added example/example.dart demonstrating DataFrame and Series usage.
- [DOCS] Added example/geodataframe_example.dart demonstrating GeoDataFrame usage.
- [DOCS] Added example/geoseries.dart demonstrating GeoSeries usage.
- [MISC] Added output.geojson example output file.
- [TEST] Added a suite of unit tests in test/ covering DataFrame and Series functionalities.
- [DOCS] Added bin/dartframe.dart as a simple executable example.
0.3.4 #
- [Fixed] Doc Strings.
- [IMPROVEMENT] Improved dart format.
0.3.3 #
- [Fixed] Doc Strings.
0.3.2 #
- [IMPROVEMENT] Migrated to WASM.
0.3.1 #
- [Fix] Fixed readme.
0.3.0 #
- [Fix] Fixed readme.
- [FEATURE] Added more properties
- [FIX] Fixed DataFrame constructor to create modifiable column lists, allowing column addition after initialization
- [FIX] Updated Series toString() method to properly display custom indices
- [FEATURE] Added GeoSeries class for spatial data analysis
- [FEATURE] Added GeoSeries.fromXY() factory constructor to create point geometries from x, y coordinates
- [FEATURE] Added GeoSeries.fromWKT() factory constructor to create geometries from WKT strings
- [FEATURE] Added GeoSeries.fromFeatureCollection() factory constructor to create geometries from GeoJSON
- [FEATURE] Added spatial analysis methods to GeoSeries:
- getCoordinates() - extracts coordinates as a DataFrame
- countCoordinates - counts coordinate pairs in each geometry
- countGeometries - counts geometries in multi-part geometries
- countInteriorRings - counts interior rings in polygonal geometries
- isClosed - checks if LineStrings are closed
- isEmpty - checks if geometries are empty
- isRing - checks if features are rings
- isValid - validates geometry structures
- hasZ - checks for 3D coordinates
- bounds - gets bounding boxes for geometries
- totalBounds - gets overall bounds of all geometries
- centroid - calculates centroids of geometries
- type - gets geometry types
- area - calculates areas of polygonal geometries
- lengths - calculates lengths of linear geometries
- isCCW - checks if rings are counterclockwise
- contains - checks spatial containment relationships
- [IMPROVEMENT] Enhanced Series class to support custom indices similar to DataFrame
- [IMPROVEMENT] Renamed DataFrame's rowHeader to index for consistency with pandas API
- [IMPROVEMENT] Updated DataFrame constructor to accept index parameter
0.2.2 #
- [Fix] Fixed readme.
- [FEATURE] Added topics to the package/library
0.2.1 #
- [Fix] Fixed readme.
0.2.0 #
- [FEATURE] Added
isEmptyandisNotEmptyproperties to check if DataFrame has rows - [FEATURE] Added
copy()method to create deep copies of DataFrames - [FEATURE] Added dimension properties:
rowCount,columnCount, andshape - [FEATURE] Added
dtypesproperty to get column data types - [FEATURE] Added
hasColumn()method to check for column existence - [FEATURE] Added
unique()method to get DataFrame with only unique rows - [FEATURE] Added
unique()method to Series to get unique values - [FEATURE] Added
resetIndex()method for reindexing after filtering - [FEATURE] Added conversion methods:
toListOfMaps()andtoMap() - [FEATURE] Added
sample()method for randomly sampling rows - [FEATURE] Added
applyToColumn()method for applying functions to column elements - [FEATURE] Added
applyToRows()method for applying functions to each row - [FEATURE] Added
corr()method for computing correlation coefficients - [FEATURE] Added
bin()method for creating bins from continuous data - [FEATURE] Added
toCsv()method for converting DataFrame to CSV string - [FEATURE] Added
pivot()method for creating pivot tables - [FEATURE] Added
melt()method for reshaping data from wide to long format - [FEATURE] Added
join()method for combining DataFrames - [IMPROVEMENT] Enhanced
fillna()method with strategies (mean, median, mode, forward, backward) - [FEATURE] Added
dropna()method to remove rows or columns with missing values - [IMPROVEMENT] Improved
replace()method with regex support and column targeting - [FEATURE] Added
replaceInPlace()method for in-place value replacement - [FEATURE] Added
astype()method to convert column data types - [FEATURE] Added
round()method to round numeric values to specified precision - [FEATURE] Added
rolling()method for computing rolling window calculations - [FEATURE] Added
cumulative()method for cumulative calculations (sum, product, min, max) - [FEATURE] Added
quantile()method to compute quantiles over a column - [FEATURE] Added
rank()method to compute numerical rank along a column - [FEATURE] Added
abs()method to Series for calculating absolute values - [FEATURE] Added
copy()method to Series for creating copies - [FEATURE] Added
cummax()method to Series for cumulative maximum calculations - [FEATURE] Added
cummin()method to Series for cumulative minimum calculations - [FEATURE] Added
cumprod()method to Series for cumulative product calculations - [IMPROVEMENT] Enhanced
cumsum()method in Series with skipna parameter - [FEATURE] Added GeoDataFrame class for handling geospatial data
- [Fix] Fixed the ability to modify individual elements in DataFrame using
df['column'][index] = valuesyntax - [FIX] Improved row header display in
toString()method to properly handle headers of varying lengths
0.1.3 #
- [IMPROVEMENT] Fixed Readme not showing the right status
- [FEATURE] Added unit tests
- [FEATURE] Added row header names/index
0.1.2 #
- Fixed description to match dart packaging
0.1.1 #
- Fixed description.
0.1.0 #
- Initial version.