Reference Documentations

Prompt Optimizer

class prompt_optimizer.poptim.AutocorrectOptim(fast: bool = False, verbose: bool = False, metrics: list = [])[source]

AutocorrectOptim is a prompt optimization technique that applies autocorrection to the prompt text. Correctly spelled words have less token count than incorrect ones. This is useful in scenarios where human client types the text.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import AutocorrectOptim
>>> p_optimizer = AutocorrectOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

optimize(prompt: str) → str[source]

Applies autocorrection to the prompt text.

Parameters: prompt (str) – The prompt text.
Returns: The optimized prompt text after applying autocorrection.
Return type: str

class prompt_optimizer.poptim.EntropyOptim(model_name: str = 'bert-base-cased', p: float = 0.1, verbose: bool = False, metrics: list = [], **kwargs)[source]

EntropyOptim is a prompt optimization technique based on entropy values of tokens. A masked language model (bert-base-cased by default) is used to compute probabilities of observing the current token based on right and left context. These probability values are further used to compute the entropy values. Optimizer then moves on to remove the tokens corresponding to lowest p percentile entropies.

The intuition of this method is that the model can infill low entropy i.e. low surprise or highly probable tokens through the context. I will probably write a paper to explain this in more detail.

EntropyOptim inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import EntropyOptim
>>> p_optimizer = EntropyOptim(p=0.1)
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

generate_confidence_values(sentence: str) → list[source]

Generates entropy values for each token in the sentence.

Parameters: sentence (str) – The input sentence.
Returns: A list of tuples containing token IDs and their corresponding entropy values.
Return type: list

load_mlm_model_tokenizer()[source]: Loads the masked language model and tokenizer.

optimize(prompt: str) → str[source]

Runs the prompt optimization technique on the prompt.: Args: prompt (str): The prompt text.

Returns: The optimized prompt text.
Return type: str

percentile_cutoff_tokens(entropy_mapping: list) → list[source]

Selects tokens with entropy values above a percentile cutoff.

Parameters: entropy_mapping (list) – A list of tuples containing token IDs and their corresponding entropy values.
Returns: A list of selected token IDs.
Return type: list

run_chunk(prompt: str) → str[source]

Runs the prompt optimization technique on a chunk of the prompt.

Parameters: prompt (str) – The chunk of the prompt.
Returns: The optimized chunk of the prompt.
Return type: str

class prompt_optimizer.poptim.LemmatizerOptim(verbose: bool = False, metrics: list = [])[source]

LemmatizerOptim is a prompt optimization technique based on lemmatization.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import LemmatizerOptim
>>> p_optimizer = LemmatizerOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

get_wordnet_pos(word: str) → str[source]

Maps the POS tag from NLTK to WordNet POS tags.

Parameters: word (str) – The word to determine the POS tag.
Returns: The WordNet POS tag.
Return type: str

optimize(prompt: str) → str[source]

Runs the lemmatizer prompt optimization technique on the prompt.

Parameters: prompt (str) – The prompt text.
Returns: The optimized prompt text.
Return type: str

class prompt_optimizer.poptim.NameReplaceOptim(verbose: bool = False, metrics: list = [])[source]

NameReplaceOptim is a prompt optimization technique based on replacing names in the prompt. Some names have lower token count (1) than others. Higher token count names can be replaced by such names to reduce token complexity. self.opti_names contains the pre-made list of such names for tiktokenizer. The list will need to be modified for other tokenizers.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import NameReplaceOptim
>>> p_optimizer = NameReplaceOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

download()[source]: Downloads the required NLTK resources.

gen_name_map(text: str) → dict[source]

Generates a mapping of names in the prompt to optimized names.

Parameters: text (str) – The prompt text.
Returns: The mapping of names to optimized names.
Return type: dict

get_opti_names() → list[source]

Retrieves the list of optimized names.

Returns: The list of optimized names.
Return type: list

opti_name_replace(text: str, mapping: dict) → str[source]

Replaces names in the text with optimized names based on the mapping.

Parameters

text (str) – The text to perform name replacement.
mapping (dict) – The mapping of names to optimized names.

Returns

The text with replaced names.

Return type

str

optimize(prompt: str) → str[source]

Runs the prompt optimization technique on the prompt.

Parameters: prompt (str) – The prompt text.
Returns: The optimized prompt text.
Return type: str

process(text: str) → nltk.tree.tree.Tree[source]

Processes the text using NLTK to identify named entities.

Parameters: text (str) – The text to process.
Returns: The parsed sentence tree containing named entities.
Return type: nltk.Tree

class prompt_optimizer.poptim.PromptOptim(verbose: bool = False, metrics: list = [], protect_tag: str = None)[source]

PromptOptim is an abstract base class for prompt optimization techniques.

It defines the common structure and interface for prompt optimization.

This class inherits from ABC (Abstract Base Class).

abstract optimize(prompt: str) → str[source]

Abstract method to run the prompt optimization technique on a prompt.

This method must be implemented by subclasses.

Parameters: prompt (str) – The prompt text.
Returns: The optimized prompt text.
Return type: str

run_json(json_data: list, skip_system: bool = False) → dict[source]

Applies prompt optimization to the JSON request object.

Parameters: json_data (dict) – The JSON data object.
Returns: The JSON data object with the content field replaced by the optimized prompt text.
Return type: dict

run_langchain(langchain_data: list, skip_system: bool = False)[source]

Runs the prompt optimizer on langchain chat data.

Parameters

langchain_data (list) – The langchain data containing ‘type’ and ‘content’ fields.
skip_system (bool, optional) – Whether to skip data with type ‘system’. Defaults to False.

Returns

The modified langchain data.

Return type

list

class prompt_optimizer.poptim.PulpOptim(p: float = 0.1, verbose: bool = False, metrics: list = [])[source]

PulpOptim is a prompt optimization technique based on integer linear programming using the Pulp library.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import PulpOptim
>>> p_optimizer = PulpOptim(p=0.1)
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

optimize(prompt: str) → str[source]

Runs the prompt optimization technique on the prompt.

Parameters: prompt (str) – The prompt text.
Returns: The optimized prompt text.
Return type: str

class prompt_optimizer.poptim.PunctuationOptim(verbose: bool = False, metrics: list = [], **kwargs)[source]

PunctuationOptim is a prompt optimization technique that removes punctuation marks from the prompt. LLMs can infer punctuations themselves in most cases, remove them.

It inherits from the PromptOptim base class.

Example

>>> from prompt_optimizer.poptim import PunctuationOptim
>>> p_optimizer = PunctuationOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

optimize(prompt: str) → str[source]

Runs the prompt optimization technique on the prompt.

Parameters: prompt (str) – The prompt text.
Returns: The optimized prompt text with punctuation marks removed.
Return type: str

class prompt_optimizer.poptim.Sequential(*optims: prompt_optimizer.poptim.base.PromptOptim)[source]

Sequential is a class that represents a sequential composition of prompt optimization techniques.

It applies a series of optimization techniques in sequence to the prompt.

Example

>>> optim1 = SomeOptimizationTechnique()
>>> optim2 = AnotherOptimizationTechnique()
>>> seq = Sequential(optim1, optim2)
>>> optimized_prompt = seq(prompt)

Parameters: *optims – Variable-length argument list of prompt optimization techniques.

optims

A list of prompt optimization techniques.

Type: list

class prompt_optimizer.poptim.StemmerOptim(verbose: bool = False, metrics: list = [])[source]

StemmerOptim is a prompt optimization technique that applies stemming to the prompt.

Stemming reduces words to their base or root form, removing suffixes and prefixes.

Example

>>> from prompt_optimizer.poptim import StemmerOptim
>>> p_optimizer = StemmerOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

optimize(prompt: str) → str[source]

Applies stemming to the prompt.

Parameters: prompt (str) – The input prompt.
Returns: The optimized prompt after applying stemming.
Return type: str

class prompt_optimizer.poptim.StopWordOptim(verbose: bool = False, metrics: list = [])[source]

StopWordOptim is a prompt optimization technique that removes stop words from the prompt.

Stop words are commonly used words (e.g., “the”, “is”, “in”) that are often considered insignificant in natural language processing tasks.

Example

>>> from prompt_optimizer.poptim import StopWordOptim
>>> p_optimizer = StopWordOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

optimize(prompt: str) → str[source]

Removes stop words from the prompt.

Parameters: prompt (str) – The input prompt.
Returns: The optimized prompt after removing stop words.
Return type: str

class prompt_optimizer.poptim.SynonymReplaceOptim(verbose: bool = False, metrics: list = [], p: float = 0.5)[source]

SynonymReplaceOptim is a prompt optimization technique that replaces words in the prompt with their synonyms.

Synonyms are words that have similar meanings to the original word. Sometimes a synonym has lower token count than the original word.

Example

>>> from prompt_optimizer.poptim import SynonymReplaceOptim
>>> p_optimizer = SynonymReplaceOptim()
>>> res = p_optimizer("example prompt...")
>>> optimized_prompt = res.content

get_word_pos(word: str) → str[source]

Get the part of speech of a word.

Parameters: word (str) – The word.
Returns: The part of speech of the word.
Return type: str

optimize(prompt: str) → str[source]

Replaces words in the prompt with their synonyms.

Parameters: prompt (str) – The input prompt.
Returns: The optimized prompt with replaced synonyms.
Return type: str

syn_replace(word: str) → str[source]

Replace a word with its synonym.

Parameters: word (str) – The word.
Returns: The best replacement synonym for the word.
Return type: str

Metrics

class prompt_optimizer.metric.BERTScoreMetric[source]

BERTScoreMetric is a metric that calculates precision, recall, and F1 score based on BERT embeddings. It inherits from the Metric base class.

Example

>>> from prompt_optimizer.metric import BERTScoreMetric
>>> metric = BERTScoreMetric()
>>> res = metric("default prompt...", "optimized prompt...")

run(prompt_before: str, prompt_after: str) → dict[source]

Calculates precision, recall, and F1 score based on BERT embeddings.

Parameters

prompt_before (str) – The text before the prompt.
prompt_after (str) – The text after the prompt.

Returns

A dictionary containing the precision, recall, and F1 score.

Return type

dict

class prompt_optimizer.metric.Metric[source]

batch_run(prompts_before: list, prompts_after: list, skip_system: bool = False, json: bool = False, langchain: bool = False) → float[source]

Runs the metric on a batch of prompts.

Parameters

prompts_before (list) – List of prompts before the modification.
prompts_after (list) – List of prompts after the modification.
skip_system (bool, optional) – Whether to skip prompts with “system” role. Defaults to False.
json (bool, optional) – Whether the prompts are JSON data. Defaults to False.
langchain (bool, optional) – Whether the prompts are langchain chat data. Defaults to False.

Returns

The average metric value across the batch.

Return type

float

abstract run(prompt_before: str, prompt_after: str) → dict[source]

Abstract method to run the metric on the given prompts.

Parameters

prompt_before (str) – The prompt before the modification.
prompt_after (str) – The prompt after the modification.

Returns

The result of the metric computation.

Return type

dict

run_json(json_data_before: dict, json_data_after: dict) → dict[source]

Runs the metric on the content of JSON data.

Parameters

json_data_before (dict) – JSON data before the modification with “content” key.
json_data_after (dict) – JSON data after the modification with “content” key.

Returns

The result of the metric computation.

Return type

dict

class prompt_optimizer.metric.TokenMetric(tokenizer: str = 'cl100k_base')[source]

TokenMetric is a metric that calculates the optimization ratio based on the number of tokens reduced. It uses tiktoken to tokenize strings and count the number of tokens.

It inherits from the Metric base class.

Example

>>> from prompt_optimizer.metric import TokenMetric
>>> metric = TokenMetric()
>>> res = metric("default prompt...", "optimized prompt...")

run(prompt_before: str, prompt_after: str) → dict[source]

Calculates the optimization ratio based on the number of tokens.

Parameters

prompt_before (str) – The text before the prompt.
prompt_after (str) – The text after the prompt.

Returns

A dictionary containing the optimization ratio.

Return type

dict