File size: 4,145 Bytes
f1e4f50
 
 
 
 
 
 
 
 
 
 
a3640b9
f1e4f50
 
2c9e240
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1e4f50
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
base_model: unsloth/Llama-3.2-3B
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- sft
---

# Model Information

The `cmd2cwl` model is an instruction fine-tuned version of the `unsloth/Llama-3.2-3B`. This model has been trained on a custom dataset consisting of help documentation from various command-line tools and corresponding CWL (Common Workflow Language) scripts. Its purpose is to assist users in converting command-line tool documentation into clean and well-structured CWL scripts, enhancing automation and workflow reproducibility.

# Example
## Task
``` python
question = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Write a cwl script for md5sum with docker image alpine.

### Input:

    With no FILE, or when FILE is -, read standard input.

    -b, --binary         read in binary mode
    -c, --check          read MD5 sums from the FILEs and check them
    --tag            create a BSD-style checksum
    -t, --text           read in text mode (default)
    -z, --zero           end each output line with NUL, not newline,
    and disable file name escaping

    The following five options are useful only when verifying checksums:
    --ignore-missing  don't fail or report status for missing files
    --quiet          don't print OK for each successfully verified file
    --status         don't output anything, status code shows success
    --strict         exit non-zero for improperly formatted checksum lines
    -w, --warn           warn about improperly formatted checksum lines

    --help     display this help and exit
    --version  output version information and exit

    The sums are computed as described in RFC 1321.  When checking, the input
    should be a former output of this program.  The default mode is to print a
    line with checksum, a space, a character indicating input mode ('*' for binary,
    ' ' for text or where binary is insignificant), and name for each FILE.


### Response:
"""
```

## Using unsloth

``` python
from unsloth import FastLanguageModel
from transformers import TextStreamer

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "hubentu/cmd2cwl_Llama-3.2-3B",
    load_in_4bit = False,
)
FastLanguageModel.for_inference(model)

inputs = tokenizer(
    [question],
    return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer)

```

## Using AutoModelForCausalLM
``` python
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer

model = AutoModelForCausalLM.from_pretrained("hubentu/cmd2cwl_Llama-3.2-3B")
tokenizer = AutoTokenizer.from_pretrained("hubentu/cmd2cwl_Llama-3.2-3B")
model.to('cuda')

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_length=8192)
```

## Using generator
``` python
from transformers import pipeline
generator = pipeline('text-generation', model="checkpoints/cmd2cwl_Llama-3.2-3B", device='cuda')
resp = generator(question, max_length=8192)
print(resp[0]['generated_text'].split("### Response:\n")[-1])
```

## Output
```
cwlVersion: v1.0
class: CommandLineTool
baseCommand:
- md5sum
requirements:
- class: DockerRequirement
  dockerPull: alpine:latest
label: md5sum
doc: Compute and check MD5 checksums
inputs:
  files:
    label: files
    doc: Input files
    type: File[]
    inputBinding:
      separate: true
outputs:
  md5:
    label: md5
    doc: MD5 checksums
    type: string[]
    outputBinding:
      glob: $(inputs.files.name)
```


# Uploaded  model

- **Developed by:** hubentu
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Llama-3.2-3B

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)