Creates an observer for an rrqueue

Usage

observer(queue_name, redis_host = "127.0.0.1", redis_port = 6379, config = NULL)

Arguments

queue_name
Name of the queue
redis_host
Redis hostname
redis_port
Redis port number
config
Configuration file of key/value pairs in yaml format. See the package README for an example. If given, additional arguments to this function override values in the file which in turn override defaults of this function.

Description

Creates an observer for an rrqueue. This is the "base class" for a couple of different objects in rrqueue; notably the queue object. So any method listed here also works within queue objects.

Details

Most of the methods of the observer object are extremely simple and involve fetching information from the database about the state of tasks, environments and workers.

The method and argument names try to give hints about the sort of things they expect; a method asking for task_id expects a single task identifier, while those asking for task_ids expect a vector of task identifiers (and if they have a default NULL then will default to returning information for all task identifiers). Similarly, a method starting task_ applies to one task while a method starting tasks_ applies to multiple.

Methods

tasks_list
Return a vector of known task ids.

Usage: tasks_list()

Value: A character vector

tasks_status
Returns a named character vector indicating the task status.

Usage: tasks_status(task_ids = NULL, follow_redirect = FALSE)

Arguments:

task_ids
Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.

follow_redirect
should we follow redirects to get the status of any requeued tasks?

Value: A named character vector; the names will be the task ids, and the values are the status of each task. Possible status values are

PENDING
queued, but not run by a worker

RUNNING
being run on a worker, but not complete

COMPLETE
task completed successfully

ERROR
task completed with an error

ORPHAN
task orphaned due to loss of worker

REDIRECT
orphaned task has been redirected

MISSING
task not known (deleted, or never existed)

tasks_overview
High-level overview of the tasks in the queue; the number of tasks in each status.

Usage: tasks_overview()

tasks_times
returns a summary of times for a set of tasks

Usage: tasks_times(task_ids = NULL, unit_elapsed = "secs")

Arguments:

task_ids
Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.

unit_elapsed
Unit to use in computing elapsed times. The default is to use "secs". This is passed through to difftime so the units there are available and are "auto", "secs", "mins", "hours", "days", "weeks".

Value: A data.frame, one row per task, with columns

submitted
Time the task was submitted

started
Time the task was started, or NA if waiting

finished
Time the task was completed, or NA if waiting or running

waiting
Elapsed time spent waiting

running
Elapsed time spent running, or NA if waiting

idle
Elapsed time since finished, or NA if waiting or running

The row names of the data.frame will be the task ids.

tasks_envir
returns the mapping of tasks to environmen

Usage: tasks_envir(task_ids = NULL)

Arguments:

task_ids
Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.

Value: A named character vector; names are the task ids and the value is the environment id associated with that task.

task_get
returns a task object associated with a given task identifier. This can be used to interrogate an individual task. See the help for task objects for more about these objects.

Usage: task_get(task_id = )

Arguments:

task_id
A single task identifier

task_result
Get the result for a single task

Usage: task_result(task_id = , follow_redirect = FALSE, sanitise = FALSE)

Arguments:

task_id
A single task identifier

follow_redirect
should we follow redirects to get the status of any requeued task?

sanitise
If the task is not yet complete or is missing, return an UnfetchabmeTask object rather than throwing an error.

tasks_groups_list
Returns list of groups known to rrqueue. Groups are assigned during task creation, or through the tasks_set_group method of link{queue}.

Usage: tasks_groups_list()

tasks_in_groups
Returns a list of tasks belonging to any of the groups listed.

Usage: tasks_in_groups(groups = )

Arguments:

groups
A character vector of one or more groups (use tasks_groups_list to get a list of valid groups).

tasks_lookup_group
Look up the group for a set of tasks

Usage: tasks_lookup_group(task_ids = NULL)

Arguments:

task_ids
Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.

Value: A named character vector; names refer to task ids and the value is the group (or NA if no group is set for that task id).

task_bundle_get
Return a "bundle" of tasks that can be operated on together; see task_bundle

Usage: task_bundle_get(groups = NULL, task_ids = NULL)

Arguments:

groups
A vector of groups to include in the bundle

task_ids
A vector of task ids in the bundle. Unlike all other uses of task_ids here, only one of groups or task_ids can be provided, so if task_ids=NULL then task_ids is ignored and groups is used.

envirs_list
Return a vector of all known environment ids in this queue.

Usage: envirs_list()

envirs_contents
Return a vector of the environment contents

Usage: envirs_contents(envir_ids = NULL)

Arguments:

envir_ids
Vector of environment ids. If omitted then all environments in this queue are used.

Value: A list, each element of which is a list of elements

packages
a vector of packages loaded

sources
a vector of files explicitly sourced

source_files
a vector of files sourced including their hashes. This includes and files detected to be sourced by another file

envir_workers
Determine which workers are known to be able to process tasks in a particular environment.

Usage: envir_workers(envir_id = , worker_ids = NULL)

Arguments:

envir_id
A single environment id

worker_ids
Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

Value: A named logical vector; TRUE if a worker can use an environment, named by the worker identifers.

workers_len
Number of workers that have made themselves known to rrqueue. There are situations where this is an overestimate and that may get fixed at some point.

Usage: workers_len()

workers_list
Returns a vector of all known worker identifiers (may include workers that have crashed).

Usage: workers_list()

workers_list_exited
Returns a vector of workers that are known to have exited. Workers leave behind most of the interesting bits of logs, times, etc, so these identifiers are useful for asking what they worked on.

Usage: workers_list_exited()

workers_status
Returns a named character vector indicating the task status.

Usage: workers_status(worker_ids = NULL)

Arguments:

worker_ids
Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

Value: A named character vector; the names will be the task ids, and the values are the status of each task. Possible status values are

IDLE
worker is idle

BUSY
worker is running a task

LOST
worker has been lost; this is only currently detectable via detecting orphan jobs

workers_times
returns a summary of times for a set of workers. This only returns useful information if the workers are running a heartbeat process, which requires the RedisHeartbeat package.

Usage: workers_times(worker_ids = NULL, unit_elapsed = "secs")

Arguments:

worker_ids
Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

unit_elapsed
Unit to use in computing elapsed times. The default is to use "secs". This is passed through to difftime so the units there are available and are "auto", "secs", "mins", "hours", "days", "weeks".

Value: A data.frame, one row per worker, with columns

worker_id
Worker identifier

expire_max
Maximum length of time before worker can be declared missing, in seconds

expire
Time until the worker will expire, in seconds

last_seen
Time since the worker was last seen

last_action
Time since the last worker action

workers_log_tail
Return the last few entries in the worker logs.

Usage: workers_log_tail(worker_ids = NULL, n = 1)

Arguments:

worker_ids
Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

n
Number of log entries to return. Use 0 or Inf to return all entries.

Value:

A data.frame with columns

worker_id
the worker identifier

time
time of the event

command
the command (e.g., MESSAGE, ALIVE)

message
The message associated with the command

workers_info
Returns a set of key/value information about workers. Includes things like hostnames, process ids, environments that can be run, etc. Note that this information is from the last time that the worker process registered an INFO command. This is registered at startup and after recieving a INFO message from a queue object. So the information may be out of date.

Usage: workers_info(worker_ids = NULL)

Arguments:

worker_ids
Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

Value: A list, each element of which is a worker_info

worker_envir
Returns an up-to-date list of environments a worker is capable of using (in contrast to the entry in workers_info that might be out of date.

Usage: worker_envir(worker_id = )

Arguments:

worker_id
Single worker identifier