Run in parallel

[This note-book is in oceantracker/tutorials_how_to/]

Running in parallel can be done using the “run with parameter dict.” approach, or the helper class method. Both build a base_case, which is the parameter defaults for each case, plus a list of “cases” parameters specific to each parallel case. The output files for each case will be tagged with a case number, 0-n. The number of computationally active cases is be limited to be one less than the number of physical processes available.

The cases can be different, eg. have different release groups etc. A small number of settings must be the same for all cases, eg. setting “root_output_folder” must be the same for all cases. These settings will be taken from the base case. Warnings are given if trying to use different values within case parameters.

Is is strongly recommend to run parallel cases from within a python script, rather than notebooks which have limitations in Windows and may result in errors.

Note: For large runs, memory to store the hindcast for each case may cause an error. To work around this, reduce the size of the hindcast buffer, (“reader” class parameter “time_buffer_size”), or reduce the number of processors (setting “processors”).

Example parallel release groups

The below example runs a separate release group as its own parallel “case”.

To run in parallel on windows requires the run to be within a if name == ‘main’ block. iPython note books ignores if name == ‘main’ statements, thus the below may not run when within a notebook in windows. Use in notebooks also frequently gives “file in use by another process” errors.

To avoid this run parallel case as a script, eg. code in file “oceantracker/tutorials_how_to/P_running_cases_in_parallel.py”.

Run parallel with helper class

# run in parallel using helper class method
from oceantracker import main
from  importlib import reload # force a reload to get around notebooks single name space
reload(main)


ot = main.OceanTracker()

# setup base case
# by default settings and classes are added to base case
ot.settings(output_file_base= 'parallel_test2',      # name used as base for output files
    root_output_dir='output',             #  output is put in dir   'root_output_dir'/'output_file_base'
    time_step = 120,  #  2 min time step as seconds
    )

ot.add_class('reader',input_dir= '../demos/demo_hindcast/schsim3D',  # folder to search for hindcast files, sub-dirs will, by default, also be searched
                      file_mask=  'demo_hindcast_schisim3D*.nc')  # hindcast file mask

# now put a release group with one point into case list
# define the required release  points
points = [  [1597682.1237, 5489972.7479],
            [1598604.1667, 5490275.5488],
            [1598886.4247, 5489464.0424],
            [1597917.3387, 5489000],
        ]
# run each point release in parrallel
for n, p in enumerate(points):
    # add a release group with one point to case "n"
    ot.add_class('release_groups',
                case=n, # this adds release group to the n'th case to run in //
                name ='mypoint'+str(n), # optional name for each group
                points= p,  # needs to be 1, by 2 list for single 2D point
                release_interval= 3600,           # seconds between releasing particles
                pulse_size= 10,                   # number of particles released each release_interval
                )

# to run parallel in windows, must put run  call  inside the below "if __name__ == '__main__':" block
if __name__ == '__main__':
    # base case and case_list exist as attributes ot.params and ot.case_list
    # run as parallel set of cases
    case_info_files= ot.run()


    # NOTE for parallel runs case_info_files is a list, one for each case run
    # so to load track files use
    # tracks = load_output_files.load_track_data(case_info_files[n])
    #   where n is the case number 0,1,2...
helper ----------------------------------------------------------------------
helper Starting OceanTracker helper class
helper   - Starting run using helper class
Main      Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:27:10) [MSC v.1938 64 bit (AMD64)]
Main >>> Warning: Oceantracker is not yet compatible with Python 3.11, as not all imported packages have been updated, eg netcdf4
Main ----------------------------------------------------------------------
Main OceanTracker starting main:
Main     Starting package set up
Main         -  Built OceanTracker package tree,      0.603 sec
Main         -  Built OceanTracker sort name map,     0.000 sec
Main     -  Done package set up to setup ClassImporter,       0.603 sec
Main >>> Warning: Deleted contents of existing output dir
Main Output is in dir "f:H_Local_driveParticleTrackingoceantrackertutorials_how_tooutputparallel_test2"
Main       hint: see for copies of screen output and user supplied parameters, plus all other output
Main     >>> Note: to help with debugging, parameters as given by user  are in "user_given_params.json"
Main ----------------------------------------------------------------------
Main  OceanTracker version 0.50.0010-2024-03-30 - preliminary setup
Main   - Found input dir "../demos/demo_hindcast/schsim3D"
Main   - found hydro-model files of type  "SCHISM"
Main Cataloging hindcast with 1 files in dir ../demos/demo_hindcast/schsim3D
Main     -  Cataloged hydro-model files/variables in time order,      0.008 sec
Main >>> Note: No bottom_stress variable in in hydro-files, using near seabed velocity to calculate friction_velocity for resuspension
Main     -  sorted hyrdo-model files in time order,   0.034 sec
Main   - oceantracker:multiProcessing: processors:4
Main   - parallel pool complete
End --- Summary ----------------------------------------------------------
End     >>> Note: Run summary with case file names in "*_runInfo.json"
End     >>> Note: to help with debugging, parameters as given by user  are in "user_given_params.json"
End >>> Note: No bottom_stress variable in in hydro-files, using near seabed velocity to calculate friction_velocity for resuspension
End     >>> Note: Run summary with case file names in "*_runInfo.json"
End >>> Warning: Oceantracker is not yet compatible with Python 3.11, as not all imported packages have been updated, eg netcdf4
End >>> Warning: Deleted contents of existing output dir
End ----------------------------------------------------------------------
End ----------------------------------------------------------------------
End OceanTracker summary:  elapsed time =0:00:17.406447
End       Cases -   0 errors,   0 warnings,  12 notes, check above
End       Main  -   0 errors,   2 warnings,   3 notes, check above
End   Output in None
End ----------------------------------------------------------------------

Run parallel using param. dicts.

# oceantracker parallel demo, run different release groups as parallel processes
from oceantracker.main import OceanTracker

# make instance of oceantracker to use to set parameters using code, then run
ot = OceanTracker()

# first build base case, params used for all cases
params=dict(debug =True,
    output_file_base= 'parallel_test1',      # name used as base for output files
    root_output_dir= 'output',             #  output is put in dir   'root_output_dir'/'output_file_base'
    time_step = 120,  #  2 min time step as seconds
    )
params['reader']= dict(input_dir= '../demos/demo_hindcast/schsim3D',  # folder to search for hindcast files, sub-dirs will, by default, also be searched
                      file_mask=  'demo_hindcast_schisim3D*.nc')

# define the required release  points
points = [  [1597682.1237, 5489972.7479],
            [1598604.1667, 5490275.5488],
            [1598886.4247, 5489464.0424],
            [1597917.3387, 5489000],
        ]

# build a list of params for each case, with one release group for each point
params['case_list'] =[]
for n,p in enumerate(points):
    # add one point as a release group to this case
    d = dict( name= 'mypoint'+str(n),# better to give release group a unique name
            points= [p],  # needs to be 1, by 2 list for single 2D point
            release_interval= 3600,           # seconds between releasing particles
            pulse_size= 10,                   # number of particles released each release_interval
            )
    case_param =dict(release_groups=[d]) # release group list of one or more releases
    params['case_list'].append(case_param)

# to run parallel in windows, must put run  call  inside the below "if __name__ == '__main__':" block
if __name__ == '__main__':

    # run as parallel set of cases
    #    by default uses two less than the number of physical processors at one time, use setting "processors"
    case_info_files= main.run(params)

    # NOTE for parallel runs case_info_files is a list, one for each case run
    # so to load track files use
    # tracks = load_output_files.load_track_data(case_info_files[n])
    #   where n is the case number 0,1,2...
helper ----------------------------------------------------------------------
helper Starting OceanTracker helper class
Main      Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:27:10) [MSC v.1938 64 bit (AMD64)]
Main >>> Warning: Oceantracker is not yet compatible with Python 3.11, as not all imported packages have been updated, eg netcdf4
Main ----------------------------------------------------------------------
Main OceanTracker starting main:
Main     Starting package set up
Main         -  Built OceanTracker package tree,      0.010 sec
Main         -  Built OceanTracker sort name map,     0.000 sec
Main     -  Done package set up to setup ClassImporter,       0.010 sec
Main >>> Warning: Deleted contents of existing output dir
Main Output is in dir "f:H_Local_driveParticleTrackingoceantrackertutorials_how_tooutputparallel_test1"
Main       hint: see for copies of screen output and user supplied parameters, plus all other output
Main     >>> Note: to help with debugging, parameters as given by user  are in "user_given_params.json"
Main ----------------------------------------------------------------------
Main  OceanTracker version 0.50.0010-2024-03-30 - preliminary setup
Main   - Found input dir "../demos/demo_hindcast/schsim3D"
Main   - found hydro-model files of type  "SCHISM"
Main Cataloging hindcast with 1 files in dir ../demos/demo_hindcast/schsim3D
Main     -  Cataloged hydro-model files/variables in time order,      0.009 sec
Main >>> Note: No bottom_stress variable in in hydro-files, using near seabed velocity to calculate friction_velocity for resuspension
Main     -  sorted hyrdo-model files in time order,   0.024 sec
Main   - oceantracker:multiProcessing: processors:4
Main   - parallel pool complete
End --- Summary ----------------------------------------------------------
End     >>> Note: Run summary with case file names in "*_runInfo.json"
End     >>> Note: to help with debugging, parameters as given by user  are in "user_given_params.json"
End >>> Note: No bottom_stress variable in in hydro-files, using near seabed velocity to calculate friction_velocity for resuspension
End     >>> Note: Run summary with case file names in "*_runInfo.json"
End >>> Warning: Oceantracker is not yet compatible with Python 3.11, as not all imported packages have been updated, eg netcdf4
End >>> Warning: Deleted contents of existing output dir
End ----------------------------------------------------------------------
End ----------------------------------------------------------------------
End OceanTracker summary:  elapsed time =0:00:16.965963
End       Cases -   0 errors,   0 warnings,  12 notes, check above
End       Main  -   0 errors,   2 warnings,   3 notes, check above
End   Output in None
End ----------------------------------------------------------------------