dart_rl 0.1.0-alpha.1 copy "dart_rl: ^0.1.0-alpha.1" to clipboard
dart_rl: ^0.1.0-alpha.1 copied to clipboard

A Dart package implementing reinforcement learning algorithms (SARSA, Q-Learning, Expected-SARSA) for both Dart and Flutter applications.

dart_rl #

A Dart package implementing reinforcement learning algorithms (SARSA, Q-Learning, Expected-SARSA) for both Dart and Flutter applications.

Features #

  • Q-Learning: Off-policy temporal difference learning algorithm
  • SARSA: On-policy temporal difference learning algorithm
  • Expected-SARSA: On-policy algorithm using expected Q-values
  • Clean, extensible API for implementing custom environments
  • Epsilon-greedy exploration strategy
  • Support for both discrete state and action spaces

Installation #

Add dart_rl to your pubspec.yaml:

dependencies:
  dart_rl: ^0.1.0

Then run:

dart pub get

Usage #

Basic Example #

import 'package:dart_rl/dart_rl.dart';

// Create an environment (you need to implement the Environment interface)
final environment = GridWorld();

// Create a Q-Learning agent
final agent = QLearningAgent(
  learningRate: 0.1,      // ? (alpha)
  discountFactor: 0.9,    // ? (gamma)
  epsilon: 0.1,           // ? (epsilon) for exploration
);

// Train the agent
agent.train(environment, episodes: 1000);

// Use the learned policy
environment.reset();
final action = agent.selectAction(environment, environment.currentState);
final result = environment.step(action);

Implementing a Custom Environment #

To use dart_rl with your own environment, implement the Environment interface:

class MyEnvironment implements Environment {
  @override
  State reset() {
    // Reset to initial state
    return State(initialValue);
  }

  @override
  State get currentState => /* current state */;

  @override
  List<Action> getActionsForState(State state) {
    // Return available actions for the given state
    return [Action('action1'), Action('action2')];
  }

  @override
  StepResult step(Action action) {
    // Execute action and return result
    return StepResult(
      nextState: State(newValue),
      reward: rewardValue,
      isDone: isTerminal,
    );
  }

  // ... implement other required methods
}

Available Algorithms #

Q-Learning

final agent = QLearningAgent(
  learningRate: 0.1,
  discountFactor: 0.9,
  epsilon: 0.1,
);

agent.train(environment, episodes: 1000);

SARSA

final agent = SarsaAgent(
  learningRate: 0.1,
  discountFactor: 0.9,
  epsilon: 0.1,
);

agent.train(environment, episodes: 1000);

Expected-SARSA

final agent = ExpectedSarsaAgent(
  learningRate: 0.1,
  discountFactor: 0.9,
  epsilon: 0.1,
);

agent.train(environment, episodes: 1000);

Adjusting Exploration #

You can control the exploration rate:

// Set epsilon to a specific value
agent.setEpsilon(0.05);

// Decay epsilon over time
agent.decayEpsilon(0.99); // Multiply epsilon by 0.99

// Use greedy policy (no exploration)
agent.setEpsilon(0.0);

Accessing Q-Values #

// Get Q-value for a specific state-action pair
final qValue = agent.getQValue(state, action);

// Get all Q-values for a state
final qValues = agent.getQValuesForState(state);
for (final entry in qValues.entries) {
  print('${entry.key}: ${entry.value}');
}

Examples #

See the example/ directory for complete examples:

  • grid_world_example.dart: Simple grid world navigation
  • frozen_lake_example.dart: Frozen lake environment with hazards

Run examples:

dart run example/grid_world_example.dart
dart run example/frozen_lake_example.dart

Algorithm Details #

Q-Learning #

  • Type: Off-policy
  • Update Rule: Q(s,a) = Q(s,a) + ?[r + ? * max(Q(s',a')) - Q(s,a)]
  • Learns the optimal policy regardless of the policy being followed

SARSA #

  • Type: On-policy
  • Update Rule: Q(s,a) = Q(s,a) + ?[r + ? * Q(s',a') - Q(s,a)]
  • Learns the value of the policy being followed

Expected-SARSA #

  • Type: On-policy
  • Update Rule: Q(s,a) = Q(s,a) + ?[r + ? * E[Q(s',a')] - Q(s,a)]
  • Uses expected Q-value over next actions, reducing variance compared to SARSA

Parameters #

  • learningRate (?): Controls how much new information overrides old information (0.0 to 1.0)
  • discountFactor (?): Discount factor for future rewards (0.0 to 1.0)
  • epsilon (?): Probability of exploration vs exploitation (0.0 to 1.0)

License #

MIT

Contributing #

Contributions are welcome! Please feel free to submit a Pull Request.

dart_rl #

2
likes
0
points
17
downloads

Publisher

verified publishervezz.io

Weekly Downloads

A Dart package implementing reinforcement learning algorithms (SARSA, Q-Learning, Expected-SARSA) for both Dart and Flutter applications.

Repository (GitHub)
View/report issues

License

unknown (license)

Dependencies

collection, equatable

More

Packages that depend on dart_rl