Paralleling the issue

We perform task parallelization at the level of the query to the Gaia archive.

In our example, a single database query

SELECT pmra, pmdec FROM gaiadr3.gaia_source_lite WHERE 1 = CONTAINS(POINT(56.75, 24.12),CIRCLE(ra,dec,2.0)) AND ruwe <1.4

is replaced by a sequence of queries, dividing the searched area into smaller fragments of a given size.

loop (RA_min<RA<RA_max with RA_step)
    loop(DEC_min<DEC<DEC_max with DEC_step)
          SELECT pmra, pmdec FROM gaiadr3.gaia_source_lite WHERE 1 = CONTAINS(POINT(ra,dec),BOX(RA,DEC,RA_step,DEC_step)) AND  ruwe <1.4

Queries are executed by gaia@home service and written to the input file for each task individually.

In our example, the values will be as follows:

RA_min =  56.75 - 2 = 54.75 deg = 03:39:00 
RA_max =  56.75 + 2 = 58.75 deg = 03:55:00
DEC_min=  24.12 - 2 = 22.12 deg = 22:07:12 
DEC_max=  24.12 + 2 = 26.12 deg = 26:07:12
RA_step and DEC_step - step of each small area (e.q. 5 mili-arcsec)

Assume that these files have the name gaia_data.inp

Our program prepared for parallel operation looks as follows:

#PLEIADES - parralel with phisical name of files

#Searching the Pleiades and computing the mean value and standard deviation of proper motion

import numpy as np

#loading gaia results (using numpy to read whole table)
#use defined symbolic name as file names when opening the files
r=np.loadtxt(fname="gaia_data.inp", skiprows=1)


#constants are defined in an input config file
with open("config.inp") as f:
    line = f.readlines()[0].split(' ')
    pmra0=float(line[0])
    pmdec0=float(line[1])

#filtering the output from Gaia archive for objects with (pmra-pmra0)**2+(pmdec-pmdec0**2) < 5**2
#and saving results to numpy arrays
pm_ra=np.array([])
pm_dec=np.array([])
for row in r:
    if (row[0]-pmra0)**2+(row[1]-pmdec0)**2 < 25.0:
        pm_ra=np.append(pm_ra,row[0])
        pm_dec=np.append(pm_dec,row[1])

#computing mean values and standard deviation using numpy
mean_pmra=np.mean(pm_ra)
mean_pmdec=np.mean(pm_dec)
stdev_pmra=np.std(pm_ra)
stdev_pmdec=np.std(pm_dec)

#saving results to output file
with open("result.dat",'a') as f:
    f.write(f'{mean_pmra},{mean_pmdec}\n{stdev_pmra},{stdev_pmdec}\n')

The final step is to link the names of the input and output files to the symbolic names required by the BOINC system.

Symbolic names in BOINC can be arbitrary but must, at task runtime, be associated with passed input files.

Code:                           BOINC:
input.inp                       Symbolic_input
gaia_data.inp                   Symbolic_gaia_data                
result.dat                      Symbolic_result

Finally, our code with symbolic names looks like this

Suggested knowledgebase articles:

Paralleling the issue

Article Details