Skip to content

Commit 5ecb7ab

Browse files
feat: enhance image size handling across the codebase
- Updated `im_size` attribute in `ModelInfo` to support both int and tuple formats for image dimensions. - Introduced `parse_im_size` function in CLI to handle various input formats for image size, including string representations. - Modified training, validation, and export commands to accept and process image sizes as either int (square) or tuple (height, width). - Ensured backward compatibility by maintaining int representation for existing functionality. - Adjusted relevant processors and augmentations to accommodate the new image size format.
1 parent 41e7bac commit 5ecb7ab

File tree

24 files changed

+275
-89
lines changed

24 files changed

+275
-89
lines changed

docs/cli.md

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,12 @@ Where:
8585
### 🏋️ Training
8686

8787
```bash
88-
# Basic training
88+
# Basic training (square image)
8989
focoos train --model fai-detr-m-coco --dataset mydataset.zip --im-size 640
9090

91+
# Training with non-square resolution
92+
focoos train --model fai-detr-m-coco --dataset mydataset.zip --im-size 640,480
93+
9194
# Advanced training with custom hyperparameters
9295
focoos train \
9396
--model fai-detr-m-coco \
@@ -157,9 +160,12 @@ focoos predict \
157160
### 📤 Model Export
158161

159162
```bash
160-
# Export to ONNX (default)
163+
# Export to ONNX (default, square image)
161164
focoos export --model fai-detr-m-coco --im-size 640
162165

166+
# Export to ONNX with non-square resolution
167+
focoos export --model fai-detr-m-coco --im-size 640,480
168+
163169
# Export to TorchScript
164170
focoos export \
165171
--model fai-detr-m-coco \
@@ -249,7 +255,7 @@ The interface will automatically open in your default web browser, typically at
249255
| `--model` | Model name or path | **Required** | `fai-detr-m-coco`, `path/to/model` |
250256
| `--dataset` | Dataset name or path | **Required** | `mydataset.zip`, `path/to/data/` |
251257
| `--source` | Input source (predict only) | **Required** | `image.jpg` |
252-
| `--im-size` | Input image size | 640 | Any positive integer |
258+
| `--im-size` | Input image size | 640 | Integer (square) or "height,width" or "heightxwidth" (non-square) |
253259
| `--batch-size` | Batch size | 16 | Powers of 2 recommended |
254260
| `--device` | Compute device | `cuda` | `cuda`, `cpu` |
255261
| `--workers` | Data loading workers | 4 | 0-16 recommended |
@@ -319,7 +325,7 @@ Use CLI commands programmatically in Python:
319325
```python
320326
from focoos.cli.commands import train_command, predict_command, export_command
321327

322-
# Train a model
328+
# Train a model (square image)
323329
train_command(
324330
model_name="fai-detr-m-coco",
325331
dataset_name="mydataset.zip",
@@ -330,6 +336,17 @@ train_command(
330336
batch_size=16
331337
)
332338

339+
# Train a model with non-square resolution
340+
train_command(
341+
model_name="fai-detr-m-coco",
342+
dataset_name="mydataset.zip",
343+
dataset_layout="roboflow_coco",
344+
im_size=(640, 480), # (height, width)
345+
run_name="my_training",
346+
max_iters=5000,
347+
batch_size=16
348+
)
349+
333350
# Run inference
334351
results = predict_command(
335352
model_name="fai-detr-m-coco",
@@ -364,8 +381,11 @@ focoos version
364381
# Reduce batch size
365382
focoos train --model fai-detr-m-coco --dataset data.zip --batch-size 8
366383

367-
# Use smaller image size
384+
# Use smaller image size (square)
368385
focoos train --model fai-detr-m-coco --dataset data.zip --im-size 480
386+
387+
# Use non-square image size for memory efficiency
388+
focoos train --model fai-detr-m-coco --dataset data.zip --im-size 480,360
369389
```
370390

371391
**Dataset not found:**

docs/concepts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ model_info = ModelInfo(
120120
name="custom_detector",
121121
model_family=ModelFamily.DETR,
122122
classes=["person", "car", "bicycle"],
123-
im_size=640,
123+
im_size=640, # Square image (640x640), or use (640, 480) for non-square (height, width)
124124
task=Task.DETECTION,
125125
config={
126126
"num_classes": 3,
@@ -220,7 +220,7 @@ model.train(train_args, train_dataset, val_dataset, hub=hub)
220220
Exports the model to different runtime formats for optimized inference. The main function arguments are:
221221
- `runtime_type`: specify the target runtime and must be one of the supported (see [RuntimeType](/focoos/api/ports/#focoos.ports.RuntimeType))
222222
- `out_dir`: the destination folder for the exported model
223-
- `image_size`: the target image size, as an optional integer
223+
- `image_size`: the target image size, as an optional integer (square) or tuple (height, width) for non-square images
224224

225225
The function returns an [`InferModel`](#infer-model) instance for the exported model.
226226

docs/inference.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,11 @@ runtime = RuntimeType.TORCHSCRIPT_32
151151
It's time to export the model. We can use the export method of the models.
152152

153153
```python
154+
# Export with square resolution (512x512)
154155
optimized_model = model.export(runtime_type=runtime, image_size=512)
156+
157+
# Export with non-square resolution (640x480, height x width)
158+
optimized_model = model.export(runtime_type=runtime, image_size=(640, 480))
155159
```
156160

157161
Let's visualize the output. As you will see, there are not differences from the model in pure torch.
@@ -183,5 +187,9 @@ optimized_model = model.export(runtime_type=runtime)
183187
detections = optimized_model(image)
184188
display(annotate_image(image, detections, task=model.model_info.task, classes=model.model_info.classes))
185189

190+
# Benchmark with square resolution
186191
optimized_model.benchmark(iterations=10, size=640)
192+
193+
# Benchmark with non-square resolution
194+
optimized_model.benchmark(iterations=10, size=(640, 480))
187195
```

docs/training.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,12 @@ task = dataset.task # see ports.Task for more information
6565
layout = dataset.layout # see ports.DatasetLayout for more information
6666
auto_dataset = AutoDataset(dataset_name=dataset_path, task=task, layout=layout)
6767

68+
# Square resolution (512x512)
6869
augs = DatasetAugmentations(resolution=512).get_augmentations()
6970

71+
# Non-square resolution (640x480, height x width)
72+
augs = DatasetAugmentations(resolution=(640, 480)).get_augmentations()
73+
7074
train_dataset = auto_dataset.get_split(augs=augs, split=DatasetSplitType.TRAIN)
7175
valid_dataset = auto_dataset.get_split(augs=augs, split=DatasetSplitType.VAL)
7276
```

focoos/cli/cli.py

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
"""
7474

7575
import uuid
76-
from typing import Optional, cast, get_args
76+
from typing import Optional, Tuple, Union, cast, get_args
7777

7878
import typer
7979
from typing_extensions import Annotated
@@ -100,6 +100,40 @@
100100

101101
logger = get_logger("CLI")
102102

103+
104+
def parse_im_size(value: Union[str, int]) -> Union[int, Tuple[int, int]]:
105+
"""Parse image size from string or int.
106+
107+
Supports formats:
108+
- int: 640 (square image)
109+
- str: "640" (square image)
110+
- str: "640,480" or "640x480" (non-square image as height,width)
111+
112+
Args:
113+
value: Image size as int or string
114+
115+
Returns:
116+
int for square images, tuple (height, width) for non-square
117+
"""
118+
if isinstance(value, int):
119+
return value
120+
if isinstance(value, str):
121+
# Try comma or x separator
122+
if "," in value:
123+
parts = value.split(",")
124+
elif "x" in value or "X" in value:
125+
parts = value.replace("X", "x").split("x")
126+
else:
127+
# Single number - square image
128+
return int(value)
129+
130+
if len(parts) == 2:
131+
return (int(parts[0].strip()), int(parts[1].strip()))
132+
else:
133+
raise ValueError(f"Invalid image size format: {value}. Use '640', '640,480', or '640x480'")
134+
return value
135+
136+
103137
app = typer.Typer(
104138
name="focoos",
105139
help=__doc__,
@@ -251,7 +285,9 @@ def train(
251285
Optional[str], typer.Option(help="Datasets directory (default: ~/FocoosAI/datasets/)")
252286
] = None,
253287
dataset_layout: Annotated[DatasetLayout, typer.Option(help="Dataset layout")] = DatasetLayout.ROBOFLOW_COCO,
254-
im_size: Annotated[int, typer.Option(help="Image size")] = 640,
288+
im_size: Annotated[
289+
str, typer.Option(help="Image size (int for square, or 'height,width' or 'heightxwidth' for non-square)")
290+
] = "640",
255291
output_dir: Annotated[Optional[str], typer.Option(help="Output directory")] = None,
256292
ckpt_dir: Annotated[Optional[str], typer.Option(help="Checkpoint directory")] = None,
257293
init_checkpoint: Annotated[Optional[str], typer.Option(help="Initial checkpoint path")] = None,
@@ -316,7 +352,9 @@ def train(
316352
generates a unique name using model name and UUID.
317353
datasets_dir (Optional[str]): Custom directory for datasets.
318354
dataset_layout (DatasetLayout): Layout format of the dataset. Defaults to ROBOFLOW_COCO.
319-
im_size (int): Input image size for training. Defaults to 640.
355+
im_size (str): Input image size for training. Can be int (e.g., "640") for square images,
356+
or "height,width" or "heightxwidth" (e.g., "640,480" or "640x480") for non-square images.
357+
Defaults to "640".
320358
output_dir (Optional[str]): Directory to save training outputs and logs.
321359
ckpt_dir (Optional[str]): Directory to save model checkpoints.
322360
init_checkpoint (Optional[str]): Path to initial checkpoint for transfer learning.
@@ -435,11 +473,12 @@ def train(
435473
validated_optimizer = cast(OptimizerType, optimizer.upper())
436474
assert optimizer in get_args(OptimizerType)
437475

476+
parsed_im_size = parse_im_size(im_size)
438477
train_command(
439478
model_name=model,
440479
dataset_name=dataset,
441480
dataset_layout=dataset_layout,
442-
im_size=im_size,
481+
im_size=parsed_im_size,
443482
run_name=run_name or f"{model}-{uuid.uuid4()}",
444483
output_dir=output_dir,
445484
ckpt_dir=ckpt_dir,
@@ -494,7 +533,9 @@ def val(
494533
] = None,
495534
run_name: Annotated[Optional[str], typer.Option(help="Run name")] = None,
496535
dataset_layout: Annotated[DatasetLayout, typer.Option(help="Dataset layout")] = DatasetLayout.ROBOFLOW_COCO,
497-
im_size: Annotated[int, typer.Option(help="Image size")] = 640,
536+
im_size: Annotated[
537+
str, typer.Option(help="Image size (int for square, or 'height,width' or 'heightxwidth' for non-square)")
538+
] = "640",
498539
output_dir: Annotated[Optional[str], typer.Option(help="Output directory")] = None,
499540
ckpt_dir: Annotated[Optional[str], typer.Option(help="Checkpoint directory")] = None,
500541
init_checkpoint: Annotated[Optional[str], typer.Option(help="Initial checkpoint")] = None,
@@ -677,11 +718,12 @@ def val(
677718
validated_optimizer = cast(OptimizerType, optimizer.upper())
678719
assert optimizer in get_args(OptimizerType)
679720

721+
parsed_im_size = parse_im_size(im_size)
680722
val_command(
681723
model_name=model,
682724
dataset_name=dataset,
683725
dataset_layout=dataset_layout,
684-
im_size=im_size,
726+
im_size=parsed_im_size,
685727
run_name=run_name or f"{model}-{uuid.uuid4()}",
686728
output_dir=output_dir,
687729
ckpt_dir=ckpt_dir,
@@ -881,7 +923,10 @@ def export(
881923
output_dir: Annotated[Optional[str], typer.Option(help="Output directory")] = None,
882924
device: Annotated[Optional[str], typer.Option(help="Device (cuda or cpu)")] = "cuda",
883925
onnx_opset: Annotated[Optional[int], typer.Option(help="ONNX opset version")] = 17,
884-
im_size: Annotated[Optional[int], typer.Option(help="Image size for export")] = 640,
926+
im_size: Annotated[
927+
Optional[str],
928+
typer.Option(help="Image size for export (int for square, or 'height,width' or 'heightxwidth' for non-square)"),
929+
] = "640",
885930
overwrite: Annotated[Optional[bool], typer.Option(help="Overwrite existing files")] = False,
886931
):
887932
"""Export a trained model to various deployment formats.
@@ -985,13 +1030,14 @@ def export(
9851030
try:
9861031
validated_device = cast(DeviceType, device)
9871032
assert device in get_args(DeviceType)
1033+
parsed_im_size = parse_im_size(im_size) if im_size is not None else None
9881034
export_command(
9891035
model_name=model,
9901036
format=format,
9911037
output_dir=output_dir,
9921038
device=validated_device,
9931039
onnx_opset=onnx_opset,
994-
im_size=im_size,
1040+
im_size=parsed_im_size,
9951041
overwrite=overwrite,
9961042
)
9971043
except Exception as e:

focoos/cli/commands/export.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
- [`focoos.ports.RuntimeType`][focoos.ports.RuntimeType]: Runtime type configurations
5656
"""
5757

58-
from typing import Literal, Optional
58+
from typing import Literal, Optional, Tuple, Union
5959

6060
from focoos.model_manager import ModelManager
6161
from focoos.ports import ExportFormat, RuntimeType
@@ -70,7 +70,7 @@ def export_command(
7070
output_dir: Optional[str] = None,
7171
device: Optional[Literal["cuda", "cpu"]] = None,
7272
onnx_opset: Optional[int] = None,
73-
im_size: Optional[int] = None,
73+
im_size: Optional[Union[int, Tuple[int, int]]] = None,
7474
overwrite: Optional[bool] = None,
7575
):
7676
"""Export a model to different deployment formats.
@@ -114,7 +114,8 @@ def export_command(
114114
onnx_opset (Optional[int], optional): ONNX opset version for ONNX exports.
115115
Higher versions support more operations but may have compatibility issues.
116116
Common versions: 11, 13, 16, 17. Defaults to 17 if None.
117-
im_size (Optional[int], optional): Input image size for the exported model.
117+
im_size (Optional[Union[int, Tuple[int, int]]], optional): Input image size for the exported model.
118+
If int, treated as square (size, size). If tuple, treated as (height, width).
118119
Used to define fixed input shapes for optimization.
119120
If None, uses the model's default input size.
120121
Defaults to None.

focoos/cli/commands/train.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
- [`focoos.ports.TrainerArgs`][focoos.ports.TrainerArgs]: Training configuration
7575
"""
7676

77-
from typing import Optional
77+
from typing import Optional, Tuple, Union
7878

7979
from focoos.data.auto_dataset import AutoDataset
8080
from focoos.data.default_aug import get_default_by_task
@@ -101,7 +101,7 @@ def train_command(
101101
## Dataset args
102102
dataset_name: str,
103103
dataset_layout: DatasetLayout,
104-
im_size: int,
104+
im_size: Union[int, Tuple[int, int]],
105105
##################
106106
## Training args
107107
run_name: str,
@@ -183,8 +183,9 @@ def train_command(
183183
and dataset identifiers.
184184
dataset_layout (DatasetLayout): Layout format of the dataset.
185185
Supported formats: ROBOFLOW_COCO, YOLO, COCO, etc.
186-
im_size (int): Input image size for training. Images are resized
187-
to this size while maintaining aspect ratio.
186+
im_size (Union[int, Tuple[int, int]]): Input image size for training.
187+
If int, treated as square (size, size). If tuple, treated as (height, width).
188+
Images are resized to this size while maintaining aspect ratio.
188189
run_name (str): Unique name for this training run. Used for
189190
experiment tracking, logging, and output organization.
190191
output_dir (Optional[str], optional): Directory to save training outputs

focoos/cli/commands/val.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
- [`focoos.ports.TrainerArgs`][focoos.ports.TrainerArgs]: Configuration parameters
5757
"""
5858

59-
from typing import Optional
59+
from typing import Optional, Tuple, Union
6060

6161
from focoos.data.auto_dataset import AutoDataset
6262
from focoos.data.default_aug import get_default_by_task
@@ -82,7 +82,7 @@ def val_command(
8282
## Dataset args
8383
dataset_name: str,
8484
dataset_layout: DatasetLayout,
85-
im_size: int,
85+
im_size: Union[int, Tuple[int, int]],
8686
##################
8787
## Training args
8888
run_name: str,
@@ -163,8 +163,9 @@ def val_command(
163163
and dataset identifiers.
164164
dataset_layout (DatasetLayout): Layout format of the dataset.
165165
Supported formats: ROBOFLOW_COCO, YOLO, COCO, etc.
166-
im_size (int): Input image size for validation. Images are resized
167-
to this size while maintaining aspect ratio.
166+
im_size (Union[int, Tuple[int, int]]): Input image size for validation.
167+
If int, treated as square (size, size). If tuple, treated as (height, width).
168+
Images are resized to this size while maintaining aspect ratio.
168169
run_name (str): Unique name for this validation run. Used for
169170
result organization, logging, and report generation.
170171
output_dir (Optional[str], optional): Directory to save validation outputs

0 commit comments

Comments
 (0)