tensortrade.env.environment module

class tensortrade.env.environment.TradingEnv(portfolio: Portfolio, feed: DataFeed, action_scheme: AbstractActionScheme, reward_scheme: AbstractRewardScheme, observer: AbstractObserver, *, stopper: AbstractStopper | None = None, informer: AbstractInformer | None = None, renderer: AbstractRenderer | None = None, render_mode: str | None = None, plotter: AbstractPlotter | None | List[AbstractPlotter] = None, random_start_pct: float = 0.0)[source]

Bases: Env, TimeIndexed

A trading environment made for use with Gym-compatible reinforcement learning algorithms.

Parameters:
  • action_scheme (AbstractActionScheme) – A component for generating an action to perform at each step of the environment.

  • reward_scheme (RewardScheme) – A component for computing reward after each step of the environment.

  • observer (AbstractObserver) – A component for generating observations after each step of the environment.

  • informer (AbstractInformer) – A component for providing information after each step of the environment.

  • renderer (AbstractRenderer) – A component for rendering the environment.

  • render_mode (str) – The chosen render mode. As example ‘human’.

  • plotter (AbstractPlotter) – A component for rendering the environment.

property broker: Broker
property clock: Clock

Gets the clock associated with this object.

Returns:

Clock – The clock associated with this object.

close() None[source]

Closes the environment.

property components: Dict[str, Component]

The components of the environment. (Dict[str,Component], read-only)

property feed: FeedController
property last_state: ObsState
plot(**kwargs) None[source]

Renders the environment.

property portfolio: Portfolio
render() RenderFrame | List[RenderFrame] | None[source]

Renders the environment according to gymnasium.Env specifications.

Returns:

A RenderFrame or a list of RenderFrame instances.

Return type:

Optional[Union[RenderFrame, List[RenderFrame]]]

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[ObsType, Dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method resets all components of the environment to it’s initial state and begins with a new episode. The seed parameter is used to reset the PRNG of the environment. It should always be used to initialize the environment and after the environment is terminated or truncated. Then it returns the first observation and info according to the used components.

Note

The tuple returned contains the data according to gymnasium.Env specifications:
  • observation (ObsType): The first observation of the environment. Like in step().

  • info (Dict[str, Any]): The info-dict like in step()

Returns:

A gymnasium.Env initial observation.

Return type:

Tuple[ObsType, Dict[str, Any]]

step(action: ActType) Tuple[ObsType, SupportsFloat, bool, bool, Dict[str, Any]][source]

Run one timestep of the environment’s dynamics using the agent actions.

When the end of an episode is reached (terminated or truncated), it is necessary to call reset() to reset this environment’s state for the next episode.

Note

The tuple returned contains the data according to gymnasium.Env specifications:
  • observation (ObsType): An element of the environment’s observation_space as the next observation due to the agent actions. This could be a numpy array with the observed features at that time.

  • reward (SupportsFloat): The reward as a result of taking the action.

  • terminated (bool): Whether the agent reaches the terminal state which can be positive or negative. This happens when there is no training data anymore or by the metric defined by AbstractStopper.

  • truncated (bool): Whether the truncation condition outside the scope is satisfied. This is not used by TensorTrade-NG.

  • info (Dict[str, Any]): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging). It’s controlled by the AbstractInformer.

Note

Because the internals of this method may look a bit special, hereby a little explanation:
  1. The first step is to execute the action defined by the AbstractActionScheme. After executing, we will have orders executed and therefor need to get new data.

  2. We will now use self.feed.next() to fetch the newest data with the changes (like Orders) that the action has done to the environment. We begin a new state.

  3. Now we are ready to reward this new state by using the AbstractRewardScheme.

  4. After rewarding the agent we can get a new observation and info from this new state.

  5. Last but not least we need to check if it’s time to terminate this episode. This can either happen because AbstractStopper decides it, or we don’t have any more data to begin a new state.

Parameters:

action (ActType) – An action provided by the agent to update the environment state.

Returns:

A gymnasium.Env observation of the environment to learn the agent.

Return type:

Tuple[ObsType, float, bool, bool, Dict[str, Any]]