PersonaHub

class PersonaHub:
The PersonaHub adapted from “Scaling Synthetic Data Creation with 1, 000,000,000 Personas”. PersonaHub proposes a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. By showcasing PersonaHub’s use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, the authors demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development. Please refer to the paper for more details: https://arxiv.org/pdf/2406.20094. Parameters:
  • model (BaseModelBackend, optional): The model to use for persona generation and manipulation. (default: :obj:None)

init

def __init__(self, model: Optional[BaseModelBackend] = None):

setitem

def __setitem__(self, persona: Persona):
Add a persona to the group. Parameters:
  • persona (Persona): The persona to add.

delitem

def __delitem__(self, persona_id: uuid.UUID):
Remove a persona from the group by ID. Parameters:
  • persona_id (uuid.UUID): The ID of the persona to remove.

getitem

def __getitem__(self, persona_id: uuid.UUID):
Get a persona by ID. Parameters:
  • persona_id (uuid.UUID): The ID of the persona to retrieve.

text_to_persona

def text_to_persona(
    self,
    text: str,
    action: Literal['read', 'write', 'like', 'dislike'] = 'read'
):
Infers a specific persona who is likely to [read|write|like|dislike |…] the given text. Parameters:
  • text (str): The input text for which to infer a persona.
  • action (str): The action associated with the persona (default is “read”).
Returns: Persona: The inferred persona.

persona_to_persona

def persona_to_persona(self, persona: Persona):
Derives additional personas based on interpersonal relationships from this persona. Parameters:
  • persona (Persona): The persona from which to derive related personas.
Returns: Dict[uuid.UUID, Persona]: A dictionary of related personas.

deduplicate

def deduplicate(
    self,
    embedding_model: Optional[BaseEmbedding] = None,
    similarity_threshold: float = 0.85
):
Remove similar personas from the group. Parameters:
  • embedding_model (BaseEmbedding): The embedding model for similarity compairsion. (default is None).
  • similarity_threshold (float): The similarity threshold for deduplication (default is 0.85).

_get_embedding

def _get_embedding(embedding_model: BaseEmbedding, description: Optional[str]):
Cache embeddings to reduce recomputation.

_cosine_similarity

def _cosine_similarity(vec1: np.ndarray, vec2: np.ndarray):
Copmute the cosine similarity of two vectors. Parameters:
  • vec1 (np.ndarray): Vector 1
  • vec2 (np.ndarray): Vector 2

_is_similar

def _is_similar(
    self,
    persona1: Persona,
    persona2: Persona,
    similarity_threshold: float,
    embedding_model: BaseEmbedding
):
Check if two personas are similar by consine similarity of the embeddings of their descriptions. Parameters:
  • persona1 (Persona1): A persona.
  • persona2 (Persona2): The other persona.
  • similarity_threshold (float): The threshold on consine similarity to determine whether the two personas are similar.
  • embedding_model (BaseEmbedding): The embedding model for similarity compairsion.

len

def __len__(self):

iter

def __iter__(self):

get_all_personas

def get_all_personas(self):
Return a list of all personas.