Milad Alizadeh
commited on
Tweaks to the model card
Browse files
README.md
CHANGED
|
@@ -36,49 +36,23 @@ NatureLM-audio is an audio-language model designed to address bioacoustic tasks
|
|
| 36 |
### Direct Use
|
| 37 |
|
| 38 |
NatureLM-audio can be used directly for bioacoustic tasks such as species classification, detection, and captioning. It is particularly useful for biodiversity monitoring, conservation, and animal behavior studies.
|
| 39 |
-
```python
|
| 40 |
-
from NatureLM.models import NatureLM
|
| 41 |
-
|
| 42 |
-
# Download the model from HuggingFace
|
| 43 |
-
model = NatureLM.from_pretrained("EarthSpeciesProject/NatureLM-audio")
|
| 44 |
-
model = model.eval().to("cuda")
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
```python
|
| 48 |
-
from NatureLM.infer import Pipeline
|
| 49 |
-
|
| 50 |
-
# pass your audio files in as file paths or as numpy arrays
|
| 51 |
-
# NOTE: the Pipeline class will automatically load the audio and convert them to numpy arrays
|
| 52 |
-
audio_paths = ["assets/nri-GreenTreeFrogEvergladesNP.mp3"] # wav, mp3, ogg, flac are supported.
|
| 53 |
-
|
| 54 |
-
# Create a list of queries. You may also pass a single query as a string for multiple audios.
|
| 55 |
-
# The same query will be used for all audios.
|
| 56 |
-
queries = ["What is the common name for the focal species in the audio? Answer:"]
|
| 57 |
-
|
| 58 |
-
pipeline = Pipeline(model=model)
|
| 59 |
-
# NOTE: you can also just do pipeline = Pipeline() which will download the model automatically
|
| 60 |
-
|
| 61 |
-
# Run the model over the audio in sliding windows of 10 seconds with a hop length of 10 seconds
|
| 62 |
-
results = pipeline(audio_paths, queries, window_length_seconds=10.0, hop_length_seconds=10.0)
|
| 63 |
-
print(results)
|
| 64 |
-
# ['#0.00s - 10.00s#: Green Treefrog\n']
|
| 65 |
-
```
|
| 66 |
|
| 67 |
Example prompts:
|
| 68 |
|
| 69 |
-
Prompt: What is the common name for the focal species in the audio?
|
| 70 |
-
Answer: Humpback Whale
|
| 71 |
|
| 72 |
-
Prompt: Which of these, if any, are present in the audio recording? Single pulse gibbon call, Multiple pulse gibbon call, Gibbon duet, None.
|
| 73 |
Answer: Gibbon duet
|
| 74 |
|
| 75 |
-
Prompt: What is the common name for the focal species in the audio?
|
| 76 |
Answer: Spectacled Tetraka
|
| 77 |
|
| 78 |
-
Prompt: What is the life stage of the focal species in the audio?
|
| 79 |
Answer: Juvenile
|
| 80 |
|
| 81 |
-
Prompt: What type of vocalization is heard from the focal species in the audio?
|
|
|
|
| 82 |
|
| 83 |
Prompt: Caption the audio, using the common name for any animal species.
|
| 84 |
|
|
@@ -103,7 +77,34 @@ Users should be aware of the risks, biases, and limitations of the model. It is
|
|
| 103 |
|
| 104 |
## How to Get Started with the Model
|
| 105 |
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
## Training Details
|
| 109 |
|
|
|
|
| 36 |
### Direct Use
|
| 37 |
|
| 38 |
NatureLM-audio can be used directly for bioacoustic tasks such as species classification, detection, and captioning. It is particularly useful for biodiversity monitoring, conservation, and animal behavior studies.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
Example prompts:
|
| 41 |
|
| 42 |
+
Prompt: What is the common name for the focal species in the audio?
|
| 43 |
+
Answer: Humpback Whale
|
| 44 |
|
| 45 |
+
Prompt: Which of these, if any, are present in the audio recording? Single pulse gibbon call, Multiple pulse gibbon call, Gibbon duet, None.
|
| 46 |
Answer: Gibbon duet
|
| 47 |
|
| 48 |
+
Prompt: What is the common name for the focal species in the audio?
|
| 49 |
Answer: Spectacled Tetraka
|
| 50 |
|
| 51 |
+
Prompt: What is the life stage of the focal species in the audio?
|
| 52 |
Answer: Juvenile
|
| 53 |
|
| 54 |
+
Prompt: What type of vocalization is heard from the focal species in the audio?
|
| 55 |
+
Answer with either 'call' or 'song'.
|
| 56 |
|
| 57 |
Prompt: Caption the audio, using the common name for any animal species.
|
| 58 |
|
|
|
|
| 77 |
|
| 78 |
## How to Get Started with the Model
|
| 79 |
|
| 80 |
+
Instantiating the model:
|
| 81 |
+
|
| 82 |
+
```python
|
| 83 |
+
from NatureLM.models import NatureLM
|
| 84 |
+
|
| 85 |
+
# Download the model from HuggingFace
|
| 86 |
+
model = NatureLM.from_pretrained("EarthSpeciesProject/NatureLM-audio")
|
| 87 |
+
model = model.eval().to("cuda")
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
Using the model:
|
| 91 |
+
|
| 92 |
+
```python
|
| 93 |
+
from NatureLM.infer import Pipeline
|
| 94 |
+
|
| 95 |
+
audio_paths = ["assets/nri-GreenTreeFrogEvergladesNP.mp3"]
|
| 96 |
+
queries = ["What is the common name for the focal species in the audio? Answer:"]
|
| 97 |
+
|
| 98 |
+
pipeline = Pipeline(model=model)
|
| 99 |
+
|
| 100 |
+
# Run the model over the audio in sliding windows of 10 seconds with a hop length of 10 seconds
|
| 101 |
+
results = pipeline(audio_paths, queries, window_length_seconds=10.0, hop_length_seconds=10.0)
|
| 102 |
+
|
| 103 |
+
print(results)
|
| 104 |
+
# ['#0.00s - 10.00s#: Green Treefrog\n']
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
Refer to the GitHub [repository](https://github.com/earthspecies/naturelm-audio) for more details.
|
| 108 |
|
| 109 |
## Training Details
|
| 110 |
|