Model¶

This is the class used to represent any model during the whole process of ProcessGym. Just like the Data class, this class is designed for efficient manipulation and loading of models. Feel free to utilized this class for your efficient implementation.

class src.abstractions.model.Model(model_name: str, is_instruct_finetuned: bool = True, model_path_or_repoid: str | None = None, num_gpus: int | None = None, template_type: Literal['auto', 'alpaca', 'mistral', 'llama3'] | None = None)¶

__init__(model_name: str, is_instruct_finetuned: bool = True, model_path_or_repoid: str | None = None, num_gpus: int | None = None, template_type: Literal['auto', 'alpaca', 'mistral', 'llama3'] | None = None)¶

Initialize.

Parameters:

model_name (str) – The name of the model
is_instruct_finetuned (bool = True) – Indicates if the model is instruction finetuned
model_path (Optional[str] = None) – The path to the model. When model_path is omitted, the model is searched for at a set of paths, including output/{model_name} and output/training_results/{model_name}.
num_gpus (Optional[int] = None) – Number of GPUs to use for parallel finetuning/inference. Default to the total number of gpus on the machine.
template_type (Literal["auto", "alpaca", "mistral", "llama3"] = "auto") – The type of template to use, which can be “auto”, “alpaca”, “mistral”, or “llama3”. If “auto”, the template type is inferred from the model’s config file. Set the environment variable DEFAULT_TEMPLATE to specify the default template type, if some other value than “auto” is desired.

Examples:

Model(model_name = 'Gemma-2B_sft', is_instruct_finetuned = True, model_path = f'{root}/output/training_results/Gemma-2B_sft/')
Model(model_name = 'Gemma-2B_sft', is_instruct_finetuned = True)

copy() → Model¶: Returns a shallow copy of the current Model instance.

deep_copy(dest_suffix: str | None = None, dest_full_name: str | None = None, dest_subdir: Literal['training_results', 'rlhf_results'] = 'rlhf_results', source_explicit_path: str | None = None) → Model¶

Returns a deep copy of the current Model instance, with either name suffix or full name of the resulting copy supplied.

Parameters:

dest_suffix (Optional[str] = None) – The suffix for the destination
dest_full_name (Optional[str] = None) – The full name of the destination
dest_subdir (Literal["training_results", "rlhf_results"] = "rlhf_results") – The subdirectory for the destination. It can be “training_results” or “rlhf_results”.
source_explicit_path (Optional[str] = None) – The explicit path to the source

evaluate(method: Literal['fast', 'dummy'] = 'fast', logprobs=True) → ndarray¶: Returns a high-dimensional vector representing morality preference of the model. Choose “dummy” for fast debugging runs.

finetune(data: Data, stage: Literal['sft', 'pretrain', 'dpo', 'rlhf'], algo: Literal['full_param', 'lora'], result_model_name: str, epochs: float | None = None, batch_size_multiplier_log2: int = 0, grad_accu_multiplier_log2: int = -2, lr: float | None = None, lr_scheduler_type: str | None = None, lr_scheduler_kwargs: dict | None = None, load_best_at_end: bool = True, num_nodes: int = 1, save_checkpoints: bool = True, perform_eval: bool = True, ppo_data: Data | None = None, backend: Literal['deepspeed'] = 'deepspeed') → Model¶

Out-of-place finetuning. Doesn’t update self.

Parameters:

data (Data) – The data to be used
stage (Literal["sft", "pretrain", "dpo", "rlhf"]) – The stage of the process. It can be “sft”, “pretrain”, “dpo”, or “rlhf”.
algo (Literal["full_param", "lora"]) – The algorithm to use. It can be “full_param” or “lora”.
result_model_name (str) – The name of the resulting model
epochs (Optional[float] = None) – The number of epochs
batch_size_multiplier_log2 (int = 0) – The log base 2 of the batch size multiplier
grad_accu_multiplier_log2 (int = -2) – The log base 2 of the gradient accumulation multiplier
lr (Optional[float] = None) – The learning rate
lr_scheduler_type (Optional[str] = None) – The type of learning rate scheduler
lr_scheduler_kwargs (Optional[dict] = None) – Additional arguments for the learning rate scheduler
load_best_at_end (bool = True) – Whether to load the best model at the end
num_nodes (int = 1) – The number of nodes
save_checkpoints (bool = True) – Whether to save checkpoints
perform_eval (bool = True) – Whether to perform evaluation
ppo_data (Optional[Data] = None) – The data for PPO. ppo_data is only used when stage is ‘rlhf’, and defaults to data.
backend (Literal["deepspeed"] = "deepspeed") – The backend to use. Currently only “deepspeed” is supported.

Returns:

Returns a Model instance with name {result_model_name}, which is the result of the finetuning.

Return type:

Model.

inference(data: Data | List[Dict[str, str]], result_data_name: str, backend: Literal['sglang', 'vllm', 'deepspeed', 'serial'] = 'sglang', batch_size_multiplier_log2: int = 0, temperature: float = 0.25, max_tokens: int = 8192, purpose: Literal['responses', 'logprobs'] = 'responses') → Data | List[Dict[str, str]]¶

Performance inference on a dataset (currently only instruction datasets are tested, with the same format as SFT datasets), and

Parameters:

data (Union[Data, List[Dict[str, str]]]) – The data to be used. It can be either a Data object or a list of dictionaries with string keys and values. The data argument can also be a List[Dict[str,str]] where each Dict containing two fields “instruction” and (optionally) “input”. In this case, a List[Dict[str, str]] will be returned.
result_data_name (str) – The name of the resulting data
backend (Literal["sglang", "vllm", "deepspeed", "serial"] = "sglang") – The backend to use. It can be “sglang”, “vllm”, “deepspeed”, or “serial”.
batch_size_multiplier_log2 (int = 0) – The log base 2 of the batch size multiplier
temperature (float = 0.25) – The temperature parameter.
max_tokens (int = 8192) – The maximum number of tokens to generate. Ignored if purpose is “logprobs”.
purpose (Literal["responses", "logprobs"] = "responses") – The purpose of the inference. It can be “responses” or “logprobs”. If “logprobs”, the log probability of the prompt itself (and the assistant response supplied in the predict field, if exists) is returned in the logprob field of the resulting dataset, without doing any completion. If “responses”, the completion text is saved in the predict field of the resulting dataset.

Returns:

returns the resulting dataset (completion text saved in the predict field of dicts; other fields are preserved).

Return type:

Union[Data, List[Dict[str, str]]].

backend: Which backend to use for inference. Options listed below, in descreasing order of speed.

sglang - Recommended. Parallel inference using self.num_gpus GPUs. Faster than deepspeed and serial by >= an order of magnitude.

vllm - Recommended. Parallel inference using self.num_gpus GPUs. Faster than deepspeed and serial by >= an order of magnitude.

deepspeed - Slower parallel inference using self.num_gpus GPUs. The only backend supporting pretrain-style inference.

serial - Serial inference.

save_permanent(saved_name: str | None = None, forced_rewrite: bool = False)¶: Model will be saved to model_save_path from abstractions_config.json. Without save_permanent, the model will still be present in ./output/ and can still be directly used next time without specifying the full path.