> ## Documentation Index
> Fetch the complete documentation index at: https://docs.camel-ai.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Camel.datagen.self instruct.self instruct

<a id="camel.datagen.self_instruct.self_instruct" />

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline" />

## SelfInstructPipeline

```python theme={"system"}
class SelfInstructPipeline:
```

A pipeline to generate and manage machine-generated instructions for
tasks, combining human and machine task samples.

**Parameters:**

* **agent** (ChatAgent): The agent used to interact and generate instructions.
* **seed** (str): The path to the human-written instructions.
* **num\_machine\_instructions** (int): Number of machine-generated instructions to generate. (default::obj:`5`)
* **data\_output\_path** (Optional\[str]): Path to save the generated data. (default::obj:`./data_output.json`)
* **human\_to\_machine\_ratio** (tuple): Ratio of human to machine tasks used for instruction generation. (default::obj:`(6, 2)`)
* **instruction\_filter** (InstructionFilter): A filter to validate generated instructions. (default::obj:`None`)
* **filter\_config** (Optional\[Dict\[str, Dict\[str, Any]]]): configuration for the filter functions registered in FILE\_REGISTRY. (default::obj:`None`)
* **stop\_on\_first\_failure** (bool): If True, stops checking filters after the first failure.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.__init__" />

### **init**

```python theme={"system"}
def __init__(
    self,
    agent: ChatAgent,
    seed: str,
    num_machine_instructions: int = 5,
    data_output_path: Optional[str] = './data_output.json',
    human_to_machine_ratio: tuple = (6, 2),
    instruction_filter: Optional[InstructionFilter] = None,
    filter_config: Optional[Dict[str, Dict[str, Any]]] = None,
    stop_on_first_failure: bool = False
):
```

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.load_seed" />

### load\_seed

```python theme={"system"}
def load_seed(self, path: str):
```

Load seed tasks from a file. Defaults to a predefined seed file if
no path is provided.

**Parameters:**

* **path** (str): Path to the seed file.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.sample_human_tasks" />

### sample\_human\_tasks

```python theme={"system"}
def sample_human_tasks(self, count: int):
```

Sample a specified number of human tasks from the loaded seed.

**Parameters:**

* **count** (int): Number of human tasks to sample.

**Returns:**

List\[dict]: A list of sampled human tasks.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.sample_machine_tasks" />

### sample\_machine\_tasks

```python theme={"system"}
def sample_machine_tasks(self, count: int):
```

Sample a specified number of machine tasks.

**Parameters:**

* **count** (int): Number of machine tasks to sample.

**Returns:**

List\[dict]: A list of sampled machine tasks, with placeholders if
insufficient tasks are available.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.generate_machine_instruction" />

### generate\_machine\_instruction

```python theme={"system"}
def generate_machine_instruction(self):
```

**Returns:**

List: The prompt and a machine-generated instruction.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.identify_instruction" />

### identify\_instruction

```python theme={"system"}
def identify_instruction(self, instruction: str):
```

Determine if the given instruction is a classification task.

**Parameters:**

* **instruction** (str): The instruction to classify.

**Returns:**

bool: True if the instruction is a classification task,
otherwise False.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.generate_machine_instances" />

### generate\_machine\_instances

```python theme={"system"}
def generate_machine_instances(self):
```

Generate instances for each machine task based on its
classification status.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.generate_machine_instance" />

### generate\_machine\_instance

```python theme={"system"}
def generate_machine_instance(self, instruction: str, classification: bool):
```

Generate instances for a given instruction.

**Parameters:**

* **instruction** (str): The instruction to create instances for.
* **classification** (bool): Whether the instruction is a classification task.

**Returns:**

List\[dict]: A list of generated instances in input-output format.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.parse_classification_output" />

### parse\_classification\_output

```python theme={"system"}
def parse_classification_output(self, generated_text: str):
```

Parse the generated text for classification tasks into input-output
pairs.

**Parameters:**

* **generated\_text** (str): The raw text generated by the agent for classification tasks.

**Returns:**

List\[Dict\[str, str]]: A list of dictionaries with 'input' and
'output' keys.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.parse_non_classification_output" />

### parse\_non\_classification\_output

```python theme={"system"}
def parse_non_classification_output(self, generated_text: str):
```

Parse the generated text for non-classification tasks into
input-output pairs.

**Parameters:**

* **generated\_text** (str): The raw text generated by the agent for non-classification tasks.

**Returns:**

List\[Dict\[str, str]]: A list of dictionaries with 'input' and
'output' keys.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.construct_data" />

### construct\_data

```python theme={"system"}
def construct_data(self):
```

Save the machine-generated tasks to the specified output path
in JSON format.

<a id="camel.datagen.self_instruct.self_instruct.SelfInstructPipeline.generate" />

### generate

```python theme={"system"}
def generate(self, timeout_minutes = 600):
```

Execute the entire pipeline to generate machine instructions
and instances.

**Parameters:**

* **timeout\_minutes** (int): Maximum time in minutes to run the generation process before timing out. (default: :obj:`600`)
