Model

This is the class used to represent any model during the whole process of ProcessGym. Just like the Data class, this class is designed for efficient manipulation and loading of models. Feel free to utilized this class for your efficient implementation.

class src.abstractions.model.Model(model_name: str, is_instruct_finetuned: bool = True, model_path_or_repoid: str | None = None, num_gpus: int | None = None, template_type: Literal['auto', 'alpaca', 'mistral', 'llama3'] | None = None)
__init__(model_name: str, is_instruct_finetuned: bool = True, model_path_or_repoid: str | None = None, num_gpus: int | None = None, template_type: Literal['auto', 'alpaca', 'mistral', 'llama3'] | None = None)

Initialize.

Parameters:
  • model_name (str) – The name of the model

  • is_instruct_finetuned (bool = True) – Indicates if the model is instruction finetuned

  • model_path (Optional[str] = None) – The path to the model. When model_path is omitted, the model is searched for at a set of paths, including output/{model_name} and output/training_results/{model_name}.

  • num_gpus (Optional[int] = None) – Number of GPUs to use for parallel finetuning/inference. Default to the total number of gpus on the machine.

  • template_type (Literal["auto", "alpaca", "mistral", "llama3"] = "auto") – The type of template to use, which can be “auto”, “alpaca”, “mistral”, or “llama3”. If “auto”, the template type is inferred from the model’s config file. Set the environment variable DEFAULT_TEMPLATE to specify the default template type, if some other value than “auto” is desired.

Examples:
Model(model_name = 'Gemma-2B_sft', is_instruct_finetuned = True, model_path = f'{root}/output/training_results/Gemma-2B_sft/')
Model(model_name = 'Gemma-2B_sft', is_instruct_finetuned = True)
copy() Model

Returns a shallow copy of the current Model instance.

deep_copy(dest_suffix: str | None = None, dest_full_name: str | None = None, dest_subdir: Literal['training_results', 'rlhf_results'] = 'rlhf_results', source_explicit_path: str | None = None) Model

Returns a deep copy of the current Model instance, with either name suffix or full name of the resulting copy supplied.

Parameters:
  • dest_suffix (Optional[str] = None) – The suffix for the destination

  • dest_full_name (Optional[str] = None) – The full name of the destination

  • dest_subdir (Literal["training_results", "rlhf_results"] = "rlhf_results") – The subdirectory for the destination. It can be “training_results” or “rlhf_results”.

  • source_explicit_path (Optional[str] = None) – The explicit path to the source

evaluate(method: Literal['fast', 'dummy'] = 'fast', logprobs=True) ndarray

Returns a high-dimensional vector representing morality preference of the model. Choose “dummy” for fast debugging runs.

finetune(data: Data, stage: Literal['sft', 'pretrain', 'dpo', 'rlhf'], algo: Literal['full_param', 'lora'], result_model_name: str, epochs: float | None = None, batch_size_multiplier_log2: int = 0, grad_accu_multiplier_log2: int = -2, lr: float | None = None, lr_scheduler_type: str | None = None, lr_scheduler_kwargs: dict | None = None, load_best_at_end: bool = True, num_nodes: int = 1, save_checkpoints: bool = True, perform_eval: bool = True, ppo_data: Data | None = None, backend: Literal['deepspeed'] = 'deepspeed') Model

Out-of-place finetuning. Doesn’t update self.

Parameters:
  • data (Data) – The data to be used

  • stage (Literal["sft", "pretrain", "dpo", "rlhf"]) – The stage of the process. It can be “sft”, “pretrain”, “dpo”, or “rlhf”.

  • algo (Literal["full_param", "lora"]) – The algorithm to use. It can be “full_param” or “lora”.

  • result_model_name (str) – The name of the resulting model

  • epochs (Optional[float] = None) – The number of epochs

  • batch_size_multiplier_log2 (int = 0) – The log base 2 of the batch size multiplier

  • grad_accu_multiplier_log2 (int = -2) – The log base 2 of the gradient accumulation multiplier

  • lr (Optional[float] = None) – The learning rate

  • lr_scheduler_type (Optional[str] = None) – The type of learning rate scheduler

  • lr_scheduler_kwargs (Optional[dict] = None) – Additional arguments for the learning rate scheduler

  • load_best_at_end (bool = True) – Whether to load the best model at the end

  • num_nodes (int = 1) – The number of nodes

  • save_checkpoints (bool = True) – Whether to save checkpoints

  • perform_eval (bool = True) – Whether to perform evaluation

  • ppo_data (Optional[Data] = None) – The data for PPO. ppo_data is only used when stage is ‘rlhf’, and defaults to data.

  • backend (Literal["deepspeed"] = "deepspeed") – The backend to use. Currently only “deepspeed” is supported.

Returns:

Returns a Model instance with name {result_model_name}, which is the result of the finetuning.

Return type:

Model.

inference(data: Data | List[Dict[str, str]], result_data_name: str, backend: Literal['sglang', 'vllm', 'deepspeed', 'serial'] = 'sglang', batch_size_multiplier_log2: int = 0, temperature: float = 0.25, max_tokens: int = 8192, purpose: Literal['responses', 'logprobs'] = 'responses') Data | List[Dict[str, str]]

Performance inference on a dataset (currently only instruction datasets are tested, with the same format as SFT datasets), and

Parameters:
  • data (Union[Data, List[Dict[str, str]]]) – The data to be used. It can be either a Data object or a list of dictionaries with string keys and values. The data argument can also be a List[Dict[str,str]] where each Dict containing two fields “instruction” and (optionally) “input”. In this case, a List[Dict[str, str]] will be returned.

  • result_data_name (str) – The name of the resulting data

  • backend (Literal["sglang", "vllm", "deepspeed", "serial"] = "sglang") – The backend to use. It can be “sglang”, “vllm”, “deepspeed”, or “serial”.

  • batch_size_multiplier_log2 (int = 0) – The log base 2 of the batch size multiplier

  • temperature (float = 0.25) – The temperature parameter.

  • max_tokens (int = 8192) – The maximum number of tokens to generate. Ignored if purpose is “logprobs”.

  • purpose (Literal["responses", "logprobs"] = "responses") – The purpose of the inference. It can be “responses” or “logprobs”. If “logprobs”, the log probability of the prompt itself (and the assistant response supplied in the predict field, if exists) is returned in the logprob field of the resulting dataset, without doing any completion. If “responses”, the completion text is saved in the predict field of the resulting dataset.

Returns:

returns the resulting dataset (completion text saved in the predict field of dicts; other fields are preserved).

Return type:

Union[Data, List[Dict[str, str]]].

backend: Which backend to use for inference. Options listed below, in descreasing order of speed.

sglang - Recommended. Parallel inference using self.num_gpus GPUs. Faster than deepspeed and serial by >= an order of magnitude.

vllm - Recommended. Parallel inference using self.num_gpus GPUs. Faster than deepspeed and serial by >= an order of magnitude.

deepspeed - Slower parallel inference using self.num_gpus GPUs. The only backend supporting pretrain-style inference.

serial - Serial inference.

save_permanent(saved_name: str | None = None, forced_rewrite: bool = False)

Model will be saved to model_save_path from abstractions_config.json. Without save_permanent, the model will still be present in ./output/ and can still be directly used next time without specifying the full path.