Text Generation
Transformers
Safetensors
granite
code
qiskit
conversational
ndupuis commited on
Commit
adb67cf
·
1 Parent(s): 0ba4007

Update README

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -24,8 +24,8 @@ tags:
24
  - **GitHub Repository:** Pending
25
  - **Related Papers:** [Qiskit Code Assistant: Training LLMs for
26
  generating Quantum Computing Code](https://arxiv.org/abs/2405.19495) and [Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models](https://arxiv.org/abs/2406.14712)
27
- - **Release Date**: Pending
28
- - **License:** Pending.
29
 
30
  ## Usage
31
 
@@ -69,7 +69,7 @@ for i in output:
69
 
70
  - **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on <https://github.com>), and additional synthetic data generated at IBM Quantum. We exclude code that is older than 2023.
71
  - **Exact and Fuzzy Deduplication:** We use both exact and fuzzy deduplication to remove documents having (near) identical code content.
72
- - **HAP, PII, Malware Filtering:** We rely on the base model ibm-granite/granite-8b-code-base for HAP and malware filtering from the initial datasets used in the context of the base model. We also make sure to redact Personally Identifiable Information (PII) in our datasets by replacing PII content (e.g., names, email addresses, keys, passwords) with corresponding tokens (e.g., ⟨NAME⟩, ⟨EMAIL⟩, ⟨KEY⟩, ⟨PASSWORD⟩).
73
 
74
  ## Infrastructure
75
 
 
24
  - **GitHub Repository:** Pending
25
  - **Related Papers:** [Qiskit Code Assistant: Training LLMs for
26
  generating Quantum Computing Code](https://arxiv.org/abs/2405.19495) and [Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models](https://arxiv.org/abs/2406.14712)
27
+ - **Release Date**: 06-03-2025
28
+ - **License:** apache-2.0
29
 
30
  ## Usage
31
 
 
69
 
70
  - **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on <https://github.com>), and additional synthetic data generated at IBM Quantum. We exclude code that is older than 2023.
71
  - **Exact and Fuzzy Deduplication:** We use both exact and fuzzy deduplication to remove documents having (near) identical code content.
72
+ - **HAP, PII, Malware Filtering:** We rely on the base model ibm-granite/granite-3.3-8b-base for HAP and malware filtering from the initial datasets used in the context of the base model. We also make sure to redact Personally Identifiable Information (PII) in our datasets by replacing PII content (e.g., names, email addresses, keys, passwords) with corresponding tokens (e.g., ⟨NAME⟩, ⟨EMAIL⟩, ⟨KEY⟩, ⟨PASSWORD⟩).
73
 
74
  ## Infrastructure
75