Note The noted commands to execute in this tutorial i think you haveinstalled the full library and also the requirements for thegym_plugin. The latter deserve to be set up by

In this tutorial, we:

Show an example of consistent control with an arbitrarily action room covering 2 plans for one of the gym tasks.

You are watching: Pyglet.canvas.xlib.nosuchdisplayexception: cannot connect to "none"

The task#

For this tutorial, we"ll emphasis on among the continuous-control atmospheres under the Box2D team of gymenvironments: LunarLanderContinuous-v2. In this task, the goalis come smoothly soil a lunar module in a landing pad, as shown below.


To achieve this goal, we need to provide consistent control for a key engine and directional one (2 actual values). Inorder to resolve the task, the expected reward is the at the very least 200 points. The controls for main and directional enginesare both in the variety <-1.0, 1.0> and also the observation space is written of 8 scalars indicating x and also y positions,x and also y velocities, lander angle and angular velocity, and left and also right floor contact. Keep in mind that these 8 scalarsprovide a full observation of the state.


For this tutorial, we"ll use the readily easily accessible gym_plugin, which includes awrapper for gym environments, atask sampler andtask definition, asensor to wrap the observations provided by the gymenvironment, and also a basic model.

The experiment config, similar to the one provided for theNavigation in MiniGrid tutorial, is characterized as follows:

from typing import Dict, Optional, List, Any, castimport gymimport torch.nn as nnimport torch.optim as optimfrom torch.optim.lr_scheduler income LambdaLRfrom income PPOfrom income ExperimentConfig, TaskSamplerfrom income SensorSuitefrom rwcchristchurchappeal.com_plugins.gym_plugin.gym_models import MemorylessActorCriticfrom rwcchristchurchappeal.com_plugins.gym_plugin.gym_sensors import GymBox2DSensorfrom rwcchristchurchappeal.com_plugins.gym_plugin.gym_tasks income GymTaskSamplerfrom import ( TrainingPipeline, Builder, PipelineStage, LinearDecay,)from income VizSuite, AgentViewVizclass GymTutorialExperimentConfig(ExperimentConfig):
classmethod def tag(cls) -> str: return "GymTutorial"

Sensors and Model#

As pointed out above, we"ll use a GymBox2DSensor come providefull observations from the state of the gym setting to our model.

SENSORS = < GymBox2DSensor("LunarLanderContinuous-v2", uuid="gym_box_data"), >
We specify our ActorCriticModel agent making use of a lightweight implementation with separate MLPs for actors and critic,MemorylessActorCritic. Sincethis is a version for continuous control, note that the superclass that our design is ActorCriticModelinstead the ActorCriticModel, because we"ll usage aGaussian circulation to sample actions.

classmethod def create_model(cls, **kwargs) -> nn.Module: return MemorylessActorCritic( input_uuid="gym_box_data", action_space=gym.spaces.Box( -1.0, 1.0, (2,) ), # 2 actors, each in the range <-1.0, 1.0> observation_space=SensorSuite(cls.SENSORS).observation_spaces, action_std=0.5, )

Task samplers#

We use an obtainable TaskSampler implementation because that gym environments that permits to sampleGymTasks:GymTaskSampler. Even though that is feasible to let the tasksampler instantiate the appropriate sensor for the favored task surname (by happen None), we use the sensors us createdabove, i beg your pardon contain a practice identifier because that the actual observation space (gym_box_data) also used through the model.

classmethod def make_sampler_fn(cls, **kwargs) -> TaskSampler: return GymTaskSampler(**kwargs)
For convenience, we will usage a _get_sampler_args technique to generate the job sampler debates for every threemodes, train, valid, test:

def train_task_sampler_args( self, process_ind: int, total_processes: int, devices: Optional> = None, seeds: Optional> = None, deterministic_cudnn: bool = False, ) -> Dict: return self._get_sampler_args( process_ind=process_ind, mode="train", seeds=seeds ) def valid_task_sampler_args( self, process_ind: int, total_processes: int, devices: Optional> = None, seeds: Optional> = None, deterministic_cudnn: bool = False, ) -> Dict: return self._get_sampler_args( process_ind=process_ind, mode="valid", seeds=seeds ) def test_task_sampler_args( self, process_ind: int, total_processes: int, devices: Optional> = None, seeds: Optional> = None, deterministic_cudnn: bool = False, ) -> Dict: return self._get_sampler_args(process_ind=process_ind, mode="test", seeds=seeds)
Similarly come what we perform in the Minigrid navigating tutorial, the job sampler samples random tasks for ever, while,during experimentation (or validation), us sample a fixed variety of tasks.

def _get_sampler_args( self, process_ind: int, mode: str, seeds: List ) -> Dict: """Generate initialization debates for train, valid, and test TaskSamplers. # Parameters process_ind : index of the current task sampler mode: one of `train`, `valid`, or `test` """ if setting == "train": max_tasks = nobody # limitless training work task_seeds_list = nobody # no predefined random seeds for maintain deterministic_sampling = False # randomly sample jobs in maintain else: max_tasks = 3 # one seed for each task to sample: # - ensures different seeds because that each sampler, and also # - ensures a deterministic collection of sampled tasks. Task_seeds_list = list( range(process_ind * max_tasks, (process_ind + 1) * max_tasks) ) deterministic_sampling = ( True # deterministically sample task in validation/testing ) return dict( gym_env_types=<"LunarLanderContinuous-v2">, sensors=self.SENSORS, # sensors supplied to return observations to the certified dealer max_tasks=max_tasks, # see above task_seeds_list=task_seeds_list, # see above deterministic_sampling=deterministic_sampling, # see over seed=seeds, )
Note the we just sample 3 work for validation and also testing in this case, which suffice to highlight the model"ssuccess.

Machine parameters#

Given the simplicity that the task and also model, we deserve to just train the version on the CPU. Throughout training, success shouldreach 100% in much less than 10 minutes, whereas fixing the job (evaluation prize > 200) can take about 20 minutes(on a laptop CPU).

We clues a larger variety of samplers for training (8) than for validation or trial and error (just 1), and also we default toCPU usage by returning an empty list of devices. We likewise include a video clip visualizer (AgentViewViz) in check mode.

classmethod def machine_params(cls, mode="train", **kwargs) -> Dict: visualizer = nobody if setting == "test": visualizer = VizSuite( mode=mode, video_viz=AgentViewViz( label="episode_vid", max_clip_length=400, vector_task_source=("render", "mode": "rgb_array"), fps=30, ), ) return "nprocesses": 8 if mode == "train" rather 1, "devices": <>, "visualizer": visualizer,

Training pipeline#

The last definition is the cultivate pipeline. In this case, we use a PPO stage with linearly decaying learning rateand 80 single-batch upgrade repeats every rollout:

classmethod def training_pipeline(cls, **kwargs) -> TrainingPipeline: ppo_steps = int(1.2e6) return TrainingPipeline( named_losses=dict( ppo_loss=PPO(clip_param=0.2, value_loss_coef=0.5, entropy_coef=0.0,), ), # type:ignore pipeline_stages=< PipelineStage(loss_names=<"ppo_loss">, max_stage_steps=ppo_steps), >, optimizer_builder=Builder(cast(optim.Optimizer, optim.Adam), dict(lr=1e-3)), num_mini_batch=1, update_repeats=80, max_grad_norm=100, num_steps=2000, gamma=0.99, use_gae=False, gae_lambda=0.95, advance_scene_rollout_period=None, save_interval=200000, metric_accumulate_interval=50000, lr_scheduler_builder=Builder( LambdaLR, "lr_lambda": LinearDecay(steps=ppo_steps), # type:ignore ), )

Training and also validation#

We have actually a finish implementation of this experiment"s configuration course in projects/tutorials/ start training native scratch, we simply need come invoke

PYTHONPATH=. Python gym_tutorial -b projects/tutorials -m 8 -o /PATH/TO/gym_output -s 54321 -e
from the root directory. Note that we include -e come enforce deterministic evaluation. Please refer to theNavigation in MiniGrid indict if in doubt of the meaning of the rest of parameters.

If we have actually Tensorboard installed, we can track progression with

tensorboard --logdir /PATH/TO/gym_output
which will default come the URL http://localhost:6006/.

After 1,200,000 steps, the script will terminate. If everything went well, the valid success rate should quicklyconverge come 1 and the typical reward to over 250, when the median episode length should stay listed below or close to 300.


The cultivate start date for the experiment, in YYYY-MM-DD_HH-MM-SS format, is used as the name of one of thesubfolders in the route to the checkpoints, saved under the output folder.In bespeak to advice (i.e. Test) a arsenal of checkpoints, we should pass the --eval flag and specify thedirectory containing the checkpoints through the --checkpoint CHECKPOINT_DIR option:

PYTHONPATH=. Python gym_tutorial -b projects/tutorials -m 1 -o /PATH/TO/gym_output -s 54321 -e --eval --checkpoint /PATH/TO/gym_output/checkpoints/GymTutorial/YOUR_START_DATE --approx_ckpt_step_interval 800000 # Skip part checkpoints
The alternative --approx_ckpt_step_interval 800000 tells the we only want to evaluate checkpointswhich were conserved every ~800000 steps, this lets us avoid analyzing every conserved checkpoint. If every little thing went well,the test success rate should converge to 1, the illustration length listed below or near 300 steps, and also the mean reward come above250. The images tab in tensorboard will contain videos because that the sampled check episodes.

See more: One Punch Man 2Nd Season Episode 5 Discussion, One Punch Man Season 2 Episode 5 Online Subbed


If the check command falls short with pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None", e.g. When runningremotely, shot prepending DISPLAY=:0.0 to the command above, suspect you have actually an xserver running through such displayavailable:

DISPLAY=:0.0 PYTHONPATH=. Python gym_tutorial -b projects/tutorials -m 1 -o /PATH/TO/gym_output -s 54321 -e --eval --checkpoint /PATH/TO/gym_output/checkpoints/GymTutorial/YOUR_START_DATE --approx_ckpt_step_interval 800000