Reference Documentations

Prompt Optimizer

class prompt_optimizer.poptim.AutocorrectOptim(fast: bool = False, verbose: bool = False, metrics: list = [])[source]

AutocorrectOptim is a prompt optimization technique that applies autocorrection to the prompt text. Correctly spelled words have less token count than incorrect ones. This is useful in scenarios where human client types the text.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import AutocorrectOptim
>>> p_optimizer = AutocorrectOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
optimize(prompt: str) str[source]

Applies autocorrection to the prompt text.

Parameters

prompt (str) – The prompt text.

Returns

The optimized prompt text after applying autocorrection.

Return type

str

class prompt_optimizer.poptim.EntropyOptim(model_name: str = 'bert-base-cased', p: float = 0.1, verbose: bool = False, metrics: list = [], **kwargs)[source]

EntropyOptim is a prompt optimization technique based on entropy values of tokens. A masked language model (bert-base-cased by default) is used to compute probabilities of observing the current token based on right and left context. These probability values are further used to compute the entropy values. Optimizer then moves on to remove the tokens corresponding to lowest p percentile entropies.

The intuition of this method is that the model can infill low entropy i.e. low surprise or highly probable tokens through the context. I will probably write a paper to explain this in more detail.

EntropyOptim inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import EntropyOptim
>>> p_optimizer = EntropyOptim(p=0.1)
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
generate_confidence_values(sentence: str) list[source]

Generates entropy values for each token in the sentence.

Parameters

sentence (str) – The input sentence.

Returns

A list of tuples containing token IDs and their corresponding entropy values.

Return type

list

load_mlm_model_tokenizer()[source]

Loads the masked language model and tokenizer.

optimize(prompt: str) str[source]
Runs the prompt optimization technique on the prompt.

Args: prompt (str): The prompt text.

Returns

The optimized prompt text.

Return type

str

percentile_cutoff_tokens(entropy_mapping: list) list[source]

Selects tokens with entropy values above a percentile cutoff.

Parameters

entropy_mapping (list) – A list of tuples containing token IDs and their corresponding entropy values.

Returns

A list of selected token IDs.

Return type

list

run_chunk(prompt: str) str[source]

Runs the prompt optimization technique on a chunk of the prompt.

Parameters

prompt (str) – The chunk of the prompt.

Returns

The optimized chunk of the prompt.

Return type

str

class prompt_optimizer.poptim.LemmatizerOptim(verbose: bool = False, metrics: list = [])[source]

LemmatizerOptim is a prompt optimization technique based on lemmatization.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import LemmatizerOptim
>>> p_optimizer = LemmatizerOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
get_wordnet_pos(word: str) str[source]

Maps the POS tag from NLTK to WordNet POS tags.

Parameters

word (str) – The word to determine the POS tag.

Returns

The WordNet POS tag.

Return type

str

optimize(prompt: str) str[source]

Runs the lemmatizer prompt optimization technique on the prompt.

Parameters

prompt (str) – The prompt text.

Returns

The optimized prompt text.

Return type

str

class prompt_optimizer.poptim.NameReplaceOptim(verbose: bool = False, metrics: list = [])[source]

NameReplaceOptim is a prompt optimization technique based on replacing names in the prompt. Some names have lower token count (1) than others. Higher token count names can be replaced by such names to reduce token complexity. self.opti_names contains the pre-made list of such names for tiktokenizer. The list will need to be modified for other tokenizers.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import NameReplaceOptim
>>> p_optimizer = NameReplaceOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
download()[source]

Downloads the required NLTK resources.

gen_name_map(text: str) dict[source]

Generates a mapping of names in the prompt to optimized names.

Parameters

text (str) – The prompt text.

Returns

The mapping of names to optimized names.

Return type

dict

get_opti_names() list[source]

Retrieves the list of optimized names.

Returns

The list of optimized names.

Return type

list

opti_name_replace(text: str, mapping: dict) str[source]

Replaces names in the text with optimized names based on the mapping.

Parameters
  • text (str) – The text to perform name replacement.

  • mapping (dict) – The mapping of names to optimized names.

Returns

The text with replaced names.

Return type

str

optimize(prompt: str) str[source]

Runs the prompt optimization technique on the prompt.

Parameters

prompt (str) – The prompt text.

Returns

The optimized prompt text.

Return type

str

process(text: str) nltk.tree.tree.Tree[source]

Processes the text using NLTK to identify named entities.

Parameters

text (str) – The text to process.

Returns

The parsed sentence tree containing named entities.

Return type

nltk.Tree

class prompt_optimizer.poptim.PromptOptim(verbose: bool = False, metrics: list = [], protect_tag: str = None)[source]

PromptOptim is an abstract base class for prompt optimization techniques.

It defines the common structure and interface for prompt optimization.

This class inherits from ABC (Abstract Base Class).

abstract optimize(prompt: str) str[source]

Abstract method to run the prompt optimization technique on a prompt.

This method must be implemented by subclasses.

Parameters

prompt (str) – The prompt text.

Returns

The optimized prompt text.

Return type

str

run_json(json_data: list, skip_system: bool = False) dict[source]

Applies prompt optimization to the JSON request object.

Parameters

json_data (dict) – The JSON data object.

Returns

The JSON data object with the content field replaced by the optimized prompt text.

Return type

dict

run_langchain(langchain_data: list, skip_system: bool = False)[source]

Runs the prompt optimizer on langchain chat data.

Parameters
  • langchain_data (list) – The langchain data containing ‘type’ and ‘content’ fields.

  • skip_system (bool, optional) – Whether to skip data with type ‘system’. Defaults to False.

Returns

The modified langchain data.

Return type

list

class prompt_optimizer.poptim.PulpOptim(p: float = 0.1, verbose: bool = False, metrics: list = [])[source]

PulpOptim is a prompt optimization technique based on integer linear programming using the Pulp library.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import PulpOptim
>>> p_optimizer = PulpOptim(p=0.1)
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
optimize(prompt: str) str[source]

Runs the prompt optimization technique on the prompt.

Parameters

prompt (str) – The prompt text.

Returns

The optimized prompt text.

Return type

str

class prompt_optimizer.poptim.PunctuationOptim(verbose: bool = False, metrics: list = [], **kwargs)[source]

PunctuationOptim is a prompt optimization technique that removes punctuation marks from the prompt. LLMs can infer punctuations themselves in most cases, remove them.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import PunctuationOptim
>>> p_optimizer = PunctuationOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
optimize(prompt: str) str[source]

Runs the prompt optimization technique on the prompt.

Parameters

prompt (str) – The prompt text.

Returns

The optimized prompt text with punctuation marks removed.

Return type

str

class prompt_optimizer.poptim.Sequential(*optims: prompt_optimizer.poptim.base.PromptOptim)[source]

Sequential is a class that represents a sequential composition of prompt optimization techniques.

It applies a series of optimization techniques in sequence to the prompt.

Example

>>> optim1 = SomeOptimizationTechnique()
>>> optim2 = AnotherOptimizationTechnique()
>>> seq = Sequential(optim1, optim2)
>>> optimized_prompt = seq(prompt)
Parameters

*optims – Variable-length argument list of prompt optimization techniques.

optims

A list of prompt optimization techniques.

Type

list

class prompt_optimizer.poptim.StemmerOptim(verbose: bool = False, metrics: list = [])[source]

StemmerOptim is a prompt optimization technique that applies stemming to the prompt.

Stemming reduces words to their base or root form, removing suffixes and prefixes.

Example

>>> from prompt_optimizer.poptim import StemmerOptim
>>> p_optimizer = StemmerOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
optimize(prompt: str) str[source]

Applies stemming to the prompt.

Parameters

prompt (str) – The input prompt.

Returns

The optimized prompt after applying stemming.

Return type

str

class prompt_optimizer.poptim.StopWordOptim(verbose: bool = False, metrics: list = [])[source]

StopWordOptim is a prompt optimization technique that removes stop words from the prompt.

Stop words are commonly used words (e.g., “the”, “is”, “in”) that are often considered insignificant in natural language processing tasks.

Example

>>> from prompt_optimizer.poptim import StopWordOptim
>>> p_optimizer = StopWordOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
optimize(prompt: str) str[source]

Removes stop words from the prompt.

Parameters

prompt (str) – The input prompt.

Returns

The optimized prompt after removing stop words.

Return type

str

class prompt_optimizer.poptim.SynonymReplaceOptim(verbose: bool = False, metrics: list = [], p: float = 0.5)[source]

SynonymReplaceOptim is a prompt optimization technique that replaces words in the prompt with their synonyms.

Synonyms are words that have similar meanings to the original word. Sometimes a synonym has lower token count than the original word.

Example

>>> from prompt_optimizer.poptim import SynonymReplaceOptim
>>> p_optimizer = SynonymReplaceOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content
get_word_pos(word: str) str[source]

Get the part of speech of a word.

Parameters

word (str) – The word.

Returns

The part of speech of the word.

Return type

str

optimize(prompt: str) str[source]

Replaces words in the prompt with their synonyms.

Parameters

prompt (str) – The input prompt.

Returns

The optimized prompt with replaced synonyms.

Return type

str

syn_replace(word: str) str[source]

Replace a word with its synonym.

Parameters

word (str) – The word.

Returns

The best replacement synonym for the word.

Return type

str

Metrics

class prompt_optimizer.metric.BERTScoreMetric[source]

BERTScoreMetric is a metric that calculates precision, recall, and F1 score based on BERT embeddings. It inherits from the Metric base class.

Example

>>> from prompt_optimizer.metric import BERTScoreMetric
>>> metric = BERTScoreMetric()
>>> res = metric("default prompt...", "optimized prompt...")
run(prompt_before: str, prompt_after: str) dict[source]

Calculates precision, recall, and F1 score based on BERT embeddings.

Parameters
  • prompt_before (str) – The text before the prompt.

  • prompt_after (str) – The text after the prompt.

Returns

A dictionary containing the precision, recall, and F1 score.

Return type

dict

class prompt_optimizer.metric.Metric[source]
batch_run(prompts_before: list, prompts_after: list, skip_system: bool = False, json: bool = False, langchain: bool = False) float[source]

Runs the metric on a batch of prompts.

Parameters
  • prompts_before (list) – List of prompts before the modification.

  • prompts_after (list) – List of prompts after the modification.

  • skip_system (bool, optional) – Whether to skip prompts with “system” role. Defaults to False.

  • json (bool, optional) – Whether the prompts are JSON data. Defaults to False.

  • langchain (bool, optional) – Whether the prompts are langchain chat data. Defaults to False.

Returns

The average metric value across the batch.

Return type

float

abstract run(prompt_before: str, prompt_after: str) dict[source]

Abstract method to run the metric on the given prompts.

Parameters
  • prompt_before (str) – The prompt before the modification.

  • prompt_after (str) – The prompt after the modification.

Returns

The result of the metric computation.

Return type

dict

run_json(json_data_before: dict, json_data_after: dict) dict[source]

Runs the metric on the content of JSON data.

Parameters
  • json_data_before (dict) – JSON data before the modification with “content” key.

  • json_data_after (dict) – JSON data after the modification with “content” key.

Returns

The result of the metric computation.

Return type

dict

class prompt_optimizer.metric.TokenMetric(tokenizer: str = 'cl100k_base')[source]

TokenMetric is a metric that calculates the optimization ratio based on the number of tokens reduced. It uses tiktoken to tokenize strings and count the number of tokens.

It inherits from the Metric base class.

Example

>>> from prompt_optimizer.metric import TokenMetric
>>> metric = TokenMetric()
>>> res = metric("default prompt...", "optimized prompt...")
run(prompt_before: str, prompt_after: str) dict[source]

Calculates the optimization ratio based on the number of tokens.

Parameters
  • prompt_before (str) – The text before the prompt.

  • prompt_after (str) – The text after the prompt.

Returns

A dictionary containing the optimization ratio.

Return type

dict