Skip to content

Vision

Wrap any common image representation in an Image class to convert to any other common format.

The following image representations are supported: - NumPy array - PIL Image - Base64 encoded string - File path - URL - Bytes object

The image can be resized to and from any size, compressed, and converted to and from any supported format:

```python image = Image("path/to/image.png", size=new_size_tuple).save("path/to/new/image.jpg") image.save("path/to/new/image.jpg", quality=5)

TODO: Implement Lazy attribute loading for the image data.

Image

Bases: Sample

An image sample that can be represented in various formats.

The image can be represented as a NumPy array, a base64 encoded string, a file path, a PIL Image object, or a URL. The image can be resized to and from any size and converted to and from any supported format.

Attributes:

Name Type Description
array Optional[ndarray]

The image represented as a NumPy array.

base64 Optional[Base64Str]

The base64 encoded string of the image.

path Optional[FilePath]

The file path of the image.

pil Optional[Image]

The image represented as a PIL Image object.

url Optional[AnyUrl]

The URL of the image.

size Optional[tuple[int, int]]

The size of the image as a (width, height) tuple.

encoding Optional[Literal['png', 'jpeg', 'jpg', 'bmp', 'gif']]

The encoding of the image.

Examples:

>>> image = Image("https://example.com/image.jpg")
>>> image = Image("/path/to/image.jpg")
>>> image = Image("data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/4Q3zaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLwA")
>>> jpeg_from_png = Image("path/to/image.png", encoding="jpeg")
>>> resized_image = Image(image, size=(224, 224))
>>> pil_image = Image(image).pil
>>> array = Image(image).array
>>> base64 = Image(image).base64
Source code in mbodied/types/sense/vision.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
class Image(Sample):
    """An image sample that can be represented in various formats.

    The image can be represented as a NumPy array, a base64 encoded string, a file path, a PIL Image object,
    or a URL. The image can be resized to and from any size and converted to and from any supported format.

    Attributes:
        array (Optional[np.ndarray]): The image represented as a NumPy array.
        base64 (Optional[Base64Str]): The base64 encoded string of the image.
        path (Optional[FilePath]): The file path of the image.
        pil (Optional[PILImage]): The image represented as a PIL Image object.
        url (Optional[AnyUrl]): The URL of the image.
        size (Optional[tuple[int, int]]): The size of the image as a (width, height) tuple.
        encoding (Optional[Literal["png", "jpeg", "jpg", "bmp", "gif"]]): The encoding of the image.

    Examples:
        >>> image = Image("https://example.com/image.jpg")
        >>> image = Image("/path/to/image.jpg")
        >>> image = Image("data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/4Q3zaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLwA")

        >>> jpeg_from_png = Image("path/to/image.png", encoding="jpeg")
        >>> resized_image = Image(image, size=(224, 224))
        >>> pil_image = Image(image).pil
        >>> array = Image(image).array
        >>> base64 = Image(image).base64
    """

    model_config: ConfigDict = ConfigDict(arbitrary_types_allowed=True, extras="forbid", validate_assignment=False)

    array: NumpyArray
    size: tuple[int, int]

    pil: InstanceOf[PILImage] | None = Field(
        None,
        repr=False,
        exclude=True,
        description="The image represented as a PIL Image object.",
    )
    encoding: Literal["png", "jpeg", "jpg", "bmp", "gif"]
    base64: InstanceOf[Base64Str] | None = None
    url: InstanceOf[AnyUrl] | str | None = None
    path: FilePath | None = None

    @classmethod
    def supports(cls, arg: SupportsImage) -> bool:
        if not isinstance(arg, np.ndarray | PILImage | AnyUrl | str):
            return False
        return Path(arg).exists() or arg.startswith("data:image")

    def __init__(
        self,
        arg: SupportsImage = None,
        url: str | None = None,
        path: str | None = None,
        base64: str | None = None,
        array: np.ndarray | None = None,
        pil: PILImage | None = None,
        encoding: str | None = "jpeg",
        size: Tuple | None = None,
        bytes_obj: bytes | None = None,
        **kwargs,
    ):
        """Initializes an image. Either one source argument or size tuple must be provided.

        Args:
          arg (SupportsImage, optional): The primary image source.
          url (Optional[str], optional): The URL of the image.
          path (Optional[str], optional): The file path of the image.
          base64 (Optional[str], optional): The base64 encoded string of the image.
          array (Optional[np.ndarray], optional): The numpy array of the image.
          pil (Optional[PILImage], optional): The PIL image object.
          encoding (Optional[str], optional): The encoding format of the image. Defaults to 'jpeg'.
          size (Optional[Tuple[int, int]], optional): The size of the image as a (width, height) tuple.
          **kwargs: Additional keyword arguments.
        """
        kwargs["encoding"] = encoding or "jpeg"
        kwargs["size"] = size
        if arg is not None:
            if isinstance(arg, bytes):
                kwargs["bytes"] = arg
            elif isinstance(arg, str):
                if isinstance(arg, AnyUrl):
                    kwargs["url"] = arg
                elif Path(arg).exists():
                    kwargs["path"] = arg
                else:
                    kwargs["base64"] = arg
            elif isinstance(arg, Path):
                kwargs["path"] = str(arg)
            elif isinstance(arg, np.ndarray):
                kwargs["array"] = arg
            elif isinstance(arg, PILImage):
                kwargs["pil"] = arg
            elif isinstance(arg, Image):
                # Overwrite an Image instance with the new kwargs
                kwargs.update({"array": arg.array})
            elif isinstance(arg, Tuple) and len(arg) == 2:
                kwargs["size"] = arg
            else:
                raise ValueError(f"Unsupported argument type '{type(arg)}'.")
        else:
            if url is not None:
                kwargs["url"] = url
            elif path is not None:
                kwargs["path"] = path
            elif base64 is not None:
                kwargs["base64"] = base64
            elif array is not None:
                kwargs["array"] = array
            elif pil is not None:
                kwargs["pil"] = pil
            elif bytes_obj is not None:
                kwargs["bytes"] = bytes_obj
        super().__init__(**kwargs)

    def __repr__(self):
        """Return a string representation of the image."""
        if self.base64 is None:
            return f"Image(encoding={self.encoding}, size={self.size})"
        return f"Image(base64={self.base64[:10]}..., encoding={self.encoding}, size={self.size})"

    def __str__(self):
        """Return a string representation of the image."""
        return f"Image(base64={self.base64[:10]}..., encoding={self.encoding}, size={self.size})"

    @staticmethod
    def from_base64(base64_str: str, encoding: str, size=None) -> "Image":
        """Decodes a base64 string to create an Image instance.

        Args:
            base64_str (str): The base64 string to decode.
            encoding (str): The format used for encoding the image when converting to base64.
            size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

        Returns:
            Image: An instance of the Image class with populated fields.
        """
        image_data = base64lib.b64decode(base64_str)
        image = PILModule.open(io.BytesIO(image_data)).convert("RGB")
        return Image(image, encoding, size)

    @staticmethod
    def open(path: str, encoding: str = "jpeg", size=None) -> "Image":
        """Opens an image from a file path.

        Args:
            path (str): The path to the image file.
            encoding (str): The format used for encoding the image when converting to base64.
            size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

        Returns:
            Image: An instance of the Image class with populated fields.
        """
        image = PILModule.open(path).convert("RGB")
        return Image(image, encoding, size)

    @staticmethod
    def pil_to_data(image: PILImage, encoding: str, size=None) -> dict:
        """Creates an Image instance from a PIL image.

        Args:
            image (PIL.Image.Image): The source PIL image from which to create the Image instance.
            encoding (str): The format used for encoding the image when converting to base64.
            size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

        Returns:
            Image: An instance of the Image class with populated fields.
        """
        if encoding.lower() == "jpg":
            encoding = "jpeg"
        buffer = io.BytesIO()
        image.convert("RGB").save(buffer, format=encoding.upper())
        base64_encoded = base64lib.b64encode(buffer.getvalue()).decode("utf-8")
        data_url = f"data:image/{encoding};base64,{base64_encoded}"
        if size is not None:
            image = image.resize(size)
        else:
            size = image.size
        return {
            "array": np.array(image),
            "base64": base64_encoded,
            "pil": image,
            "size": size,
            "url": data_url,
            "encoding": encoding.lower(),
        }

    @staticmethod
    def load_url(url: str, download=False) -> PILImage | None:
        """Downloads an image from a URL or decodes it from a base64 data URI.

        Args:
            url (str): The URL of the image to download, or a base64 data URI.

        Returns:
            PIL.Image.Image: The downloaded and decoded image as a PIL Image object.
        """
        if url.startswith("data:image"):
            # Extract the base64 part of the data URI
            base64_str = url.split(";base64", 1)[1]
            image_data = base64lib.b64decode(base64_str)
            return PILModule.open(io.BytesIO(image_data)).convert("RGB")

        try:
            # Open the URL and read the image data
            import urllib.request

            user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7"
            headers = {
                "User-Agent": user_agent,
            }
            if download:
                accept = input("Do you want to download the image? (y/n): ")
                if "y" not in accept.lower():
                    return None
            if not url.startswith("http"):
                raise ValueError("URL must start with 'http' or 'https'.")
            request = urllib.request.Request(url, None, headers)  # The assembled request
            response = urllib.request.urlopen(request)
            data = response.read()  # The data u need
            return PILModule.open(io.BytesIO(data)).convert("RGB")
        except Exception as e:
            logging.warning(f"Failed to load image from URL: {url}. {e}")
            logging.warning("Not validating the Image data")
            return None

    @classmethod
    def from_bytes(cls, bytes_data: bytes, encoding: str = "jpeg", size=None) -> "Image":
        """Creates an Image instance from a bytes object.

        Args:
            bytes_data (bytes): The bytes object to convert to an image.
            encoding (str): The format used for encoding the image when converting to base64.
            size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

        Returns:
            Image: An instance of the Image class with populated fields.
        """
        image = PILModule.open(io.BytesIO(bytes_data)).convert("RGB")
        return cls(image, encoding, size)

    @staticmethod
    def bytes_to_data(bytes_data: bytes, encoding: str = "jpeg", size=None) -> dict:
        """Creates an Image instance from a bytes object.

        Args:
            bytes_data (bytes): The bytes object to convert to an image.
            encoding (str): The format used for encoding the image when converting to base64.
            size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

        Returns:
            Image: An instance of the Image class with populated fields.
        """
        image = PILModule.open(io.BytesIO(bytes_data)).convert("RGB")
        return Image.pil_to_data(image, encoding, size)

    @model_validator(mode="before")
    @classmethod
    def validate_kwargs(cls, values) -> dict:
        # Ensure that exactly one image source is provided
        provided_fields = [
            k for k in values if values[k] is not None and k in ["array", "base64", "path", "pil", "url"]
        ]
        if len(provided_fields) > 1:
            raise ValueError(f"Multiple image sources provided; only one is allowed but got: {provided_fields}")

        # Initialize all fields to None or their default values
        validated_values = {
            "array": None,
            "base64": None,
            "encoding": values.get("encoding", "jpeg").lower(),
            "path": None,
            "pil": None,
            "url": None,
            "size": values.get("size", None),
        }

        # Validate the encoding first
        if validated_values["encoding"] not in ["png", "jpeg", "jpg", "bmp", "gif"]:
            raise ValueError("The 'encoding' must be a valid image format (png, jpeg, jpg, bmp, gif).")

        if "bytes" in values and values["bytes"] is not None:
            validated_values.update(cls.bytes_to_data(values["bytes"], values["encoding"], values["size"]))
            return validated_values

        if "pil" in values and values["pil"] is not None:
            validated_values.update(
                cls.pil_to_data(values["pil"], values["encoding"], values["size"]),
            )
            return validated_values
        # Process the provided image source
        if "path" in provided_fields:
            image = PILModule.open(values["path"]).convert("RGB")
            validated_values["path"] = values["path"]
            validated_values.update(cls.pil_to_data(image, validated_values["encoding"], validated_values["size"]))

        elif "array" in provided_fields:
            image = PILModule.fromarray(values["array"]).convert("RGB")
            validated_values.update(cls.pil_to_data(image, validated_values["encoding"], validated_values["size"]))

        elif "pil" in provided_fields:
            validated_values.update(
                cls.pil_to_data(values["pil"], validated_values["encoding"], validated_values["size"]),
            )

        elif "base64" in provided_fields:
            validated_values.update(
                cls.from_base64(values["base64"], validated_values["encoding"], validated_values["size"]),
            )

        elif "url" in provided_fields:
            url_path = urlparse(values["url"]).path
            file_extension = (
                Path(url_path).suffix[1:].lower() if Path(url_path).suffix else validated_values["encoding"]
            )
            validated_values["encoding"] = file_extension
            validated_values["url"] = values["url"]
            image = cls.load_url(values["url"])
            if image is None:
                validated_values["array"] = np.zeros((224, 224, 3), dtype=np.uint8)
                validated_values["size"] = (224, 224)
                return validated_values

            validated_values.update(cls.pil_to_data(image, file_extension, validated_values["size"]))
            validated_values["url"] = values["url"]

        elif "size" in values and values["size"] is not None:
            array = np.zeros((values["size"][0], values["size"][1], 3), dtype=np.uint8)
            image = PILModule.fromarray(array).convert("RGB")
            validated_values.update(cls.pil_to_data(image, validated_values["encoding"], validated_values["size"]))
        if any(validated_values[k] is None for k in ["array", "base64", "pil", "url"]):
            logging.warning(
                f"Failed to validate image data. Could only fetch {[k for k in validated_values if validated_values[k] is not None]}",
            )
        return validated_values

    def save(self, path: str, encoding: str | None = None, quality: int = 10) -> None:
        """Save the image to the specified path.

        If the image is a JPEG, the quality parameter can be used to set the quality of the saved image.
        The path attribute of the image is updated to the new file path.

        Args:
            path (str): The path to save the image to.
            encoding (Optional[str]): The encoding to use for saving the image.
            quality (int): The quality to use for saving the image.
        """
        if encoding == "png" and quality < 10:
            raise ValueError("Quality can only be set for JPEG images.")

        encoding = encoding or self.encoding
        if quality < 10:
            encoding = "jpeg"

        pil_image = self.pil
        if encoding != self.encoding:
            pil_image = Image(self.array, encoding=encoding).pil

        pil_image.save(path, encoding, quality=quality)
        self.path = path  # Update the path attribute to the new file path

    def show(self) -> None:
        import platform

        import matplotlib

        if platform.system() == "Darwin":
            matplotlib.use("TkAgg")
        import matplotlib.pyplot as plt

        plt.imshow(self.array)

    def space(self) -> spaces.Box:
        """Returns the space of the image."""
        if self.size is None:
            raise ValueError("Image size is not defined.")
        return spaces.Box(low=0, high=255, shape=(*self.size, 3), dtype=np.uint8)

    @model_serializer(mode="plain", when_used="json")
    def exclude_pil(self) -> dict:
        """Convert the image to a base64 encoded string."""
        if self.base64 in self.url:
            return {"size": self.size, "url": self.url, "encoding": self.encoding}
        return {"base64": self.base64, "size": self.size, "url": self.url, "encoding": self.encoding}

    def dump(self, *args, as_field: str | None = None, **kwargs) -> dict | Any:
        """Return a dict or a field of the image."""
        if as_field is not None:
            return getattr(self, as_field)
        return {
            "array": self.array,
            "base64": self.base64,
            "size": self.size,
            "url": self.url,
            "encoding": self.encoding,
        }

    def infer_features_dict(self) -> Features:
        """Infer features of the image."""
        return HFImage()

__init__(arg=None, url=None, path=None, base64=None, array=None, pil=None, encoding='jpeg', size=None, bytes_obj=None, **kwargs)

Initializes an image. Either one source argument or size tuple must be provided.

Parameters:

Name Type Description Default
arg SupportsImage

The primary image source.

None
url Optional[str]

The URL of the image.

None
path Optional[str]

The file path of the image.

None
base64 Optional[str]

The base64 encoded string of the image.

None
array Optional[ndarray]

The numpy array of the image.

None
pil Optional[Image]

The PIL image object.

None
encoding Optional[str]

The encoding format of the image. Defaults to 'jpeg'.

'jpeg'
size Optional[Tuple[int, int]]

The size of the image as a (width, height) tuple.

None
**kwargs

Additional keyword arguments.

{}
Source code in mbodied/types/sense/vision.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def __init__(
    self,
    arg: SupportsImage = None,
    url: str | None = None,
    path: str | None = None,
    base64: str | None = None,
    array: np.ndarray | None = None,
    pil: PILImage | None = None,
    encoding: str | None = "jpeg",
    size: Tuple | None = None,
    bytes_obj: bytes | None = None,
    **kwargs,
):
    """Initializes an image. Either one source argument or size tuple must be provided.

    Args:
      arg (SupportsImage, optional): The primary image source.
      url (Optional[str], optional): The URL of the image.
      path (Optional[str], optional): The file path of the image.
      base64 (Optional[str], optional): The base64 encoded string of the image.
      array (Optional[np.ndarray], optional): The numpy array of the image.
      pil (Optional[PILImage], optional): The PIL image object.
      encoding (Optional[str], optional): The encoding format of the image. Defaults to 'jpeg'.
      size (Optional[Tuple[int, int]], optional): The size of the image as a (width, height) tuple.
      **kwargs: Additional keyword arguments.
    """
    kwargs["encoding"] = encoding or "jpeg"
    kwargs["size"] = size
    if arg is not None:
        if isinstance(arg, bytes):
            kwargs["bytes"] = arg
        elif isinstance(arg, str):
            if isinstance(arg, AnyUrl):
                kwargs["url"] = arg
            elif Path(arg).exists():
                kwargs["path"] = arg
            else:
                kwargs["base64"] = arg
        elif isinstance(arg, Path):
            kwargs["path"] = str(arg)
        elif isinstance(arg, np.ndarray):
            kwargs["array"] = arg
        elif isinstance(arg, PILImage):
            kwargs["pil"] = arg
        elif isinstance(arg, Image):
            # Overwrite an Image instance with the new kwargs
            kwargs.update({"array": arg.array})
        elif isinstance(arg, Tuple) and len(arg) == 2:
            kwargs["size"] = arg
        else:
            raise ValueError(f"Unsupported argument type '{type(arg)}'.")
    else:
        if url is not None:
            kwargs["url"] = url
        elif path is not None:
            kwargs["path"] = path
        elif base64 is not None:
            kwargs["base64"] = base64
        elif array is not None:
            kwargs["array"] = array
        elif pil is not None:
            kwargs["pil"] = pil
        elif bytes_obj is not None:
            kwargs["bytes"] = bytes_obj
    super().__init__(**kwargs)

__repr__()

Return a string representation of the image.

Source code in mbodied/types/sense/vision.py
179
180
181
182
183
def __repr__(self):
    """Return a string representation of the image."""
    if self.base64 is None:
        return f"Image(encoding={self.encoding}, size={self.size})"
    return f"Image(base64={self.base64[:10]}..., encoding={self.encoding}, size={self.size})"

__str__()

Return a string representation of the image.

Source code in mbodied/types/sense/vision.py
185
186
187
def __str__(self):
    """Return a string representation of the image."""
    return f"Image(base64={self.base64[:10]}..., encoding={self.encoding}, size={self.size})"

bytes_to_data(bytes_data, encoding='jpeg', size=None) staticmethod

Creates an Image instance from a bytes object.

Parameters:

Name Type Description Default
bytes_data bytes

The bytes object to convert to an image.

required
encoding str

The format used for encoding the image when converting to base64.

'jpeg'
size Optional[Tuple[int, int]]

The size of the image as a (width, height) tuple.

None

Returns:

Name Type Description
Image dict

An instance of the Image class with populated fields.

Source code in mbodied/types/sense/vision.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
@staticmethod
def bytes_to_data(bytes_data: bytes, encoding: str = "jpeg", size=None) -> dict:
    """Creates an Image instance from a bytes object.

    Args:
        bytes_data (bytes): The bytes object to convert to an image.
        encoding (str): The format used for encoding the image when converting to base64.
        size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

    Returns:
        Image: An instance of the Image class with populated fields.
    """
    image = PILModule.open(io.BytesIO(bytes_data)).convert("RGB")
    return Image.pil_to_data(image, encoding, size)

dump(*args, as_field=None, **kwargs)

Return a dict or a field of the image.

Source code in mbodied/types/sense/vision.py
449
450
451
452
453
454
455
456
457
458
459
def dump(self, *args, as_field: str | None = None, **kwargs) -> dict | Any:
    """Return a dict or a field of the image."""
    if as_field is not None:
        return getattr(self, as_field)
    return {
        "array": self.array,
        "base64": self.base64,
        "size": self.size,
        "url": self.url,
        "encoding": self.encoding,
    }

exclude_pil()

Convert the image to a base64 encoded string.

Source code in mbodied/types/sense/vision.py
442
443
444
445
446
447
@model_serializer(mode="plain", when_used="json")
def exclude_pil(self) -> dict:
    """Convert the image to a base64 encoded string."""
    if self.base64 in self.url:
        return {"size": self.size, "url": self.url, "encoding": self.encoding}
    return {"base64": self.base64, "size": self.size, "url": self.url, "encoding": self.encoding}

from_base64(base64_str, encoding, size=None) staticmethod

Decodes a base64 string to create an Image instance.

Parameters:

Name Type Description Default
base64_str str

The base64 string to decode.

required
encoding str

The format used for encoding the image when converting to base64.

required
size Optional[Tuple[int, int]]

The size of the image as a (width, height) tuple.

None

Returns:

Name Type Description
Image Image

An instance of the Image class with populated fields.

Source code in mbodied/types/sense/vision.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
@staticmethod
def from_base64(base64_str: str, encoding: str, size=None) -> "Image":
    """Decodes a base64 string to create an Image instance.

    Args:
        base64_str (str): The base64 string to decode.
        encoding (str): The format used for encoding the image when converting to base64.
        size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

    Returns:
        Image: An instance of the Image class with populated fields.
    """
    image_data = base64lib.b64decode(base64_str)
    image = PILModule.open(io.BytesIO(image_data)).convert("RGB")
    return Image(image, encoding, size)

from_bytes(bytes_data, encoding='jpeg', size=None) classmethod

Creates an Image instance from a bytes object.

Parameters:

Name Type Description Default
bytes_data bytes

The bytes object to convert to an image.

required
encoding str

The format used for encoding the image when converting to base64.

'jpeg'
size Optional[Tuple[int, int]]

The size of the image as a (width, height) tuple.

None

Returns:

Name Type Description
Image Image

An instance of the Image class with populated fields.

Source code in mbodied/types/sense/vision.py
290
291
292
293
294
295
296
297
298
299
300
301
302
303
@classmethod
def from_bytes(cls, bytes_data: bytes, encoding: str = "jpeg", size=None) -> "Image":
    """Creates an Image instance from a bytes object.

    Args:
        bytes_data (bytes): The bytes object to convert to an image.
        encoding (str): The format used for encoding the image when converting to base64.
        size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

    Returns:
        Image: An instance of the Image class with populated fields.
    """
    image = PILModule.open(io.BytesIO(bytes_data)).convert("RGB")
    return cls(image, encoding, size)

infer_features_dict()

Infer features of the image.

Source code in mbodied/types/sense/vision.py
461
462
463
def infer_features_dict(self) -> Features:
    """Infer features of the image."""
    return HFImage()

load_url(url, download=False) staticmethod

Downloads an image from a URL or decodes it from a base64 data URI.

Parameters:

Name Type Description Default
url str

The URL of the image to download, or a base64 data URI.

required

Returns:

Type Description
Image | None

PIL.Image.Image: The downloaded and decoded image as a PIL Image object.

Source code in mbodied/types/sense/vision.py
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
@staticmethod
def load_url(url: str, download=False) -> PILImage | None:
    """Downloads an image from a URL or decodes it from a base64 data URI.

    Args:
        url (str): The URL of the image to download, or a base64 data URI.

    Returns:
        PIL.Image.Image: The downloaded and decoded image as a PIL Image object.
    """
    if url.startswith("data:image"):
        # Extract the base64 part of the data URI
        base64_str = url.split(";base64", 1)[1]
        image_data = base64lib.b64decode(base64_str)
        return PILModule.open(io.BytesIO(image_data)).convert("RGB")

    try:
        # Open the URL and read the image data
        import urllib.request

        user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7"
        headers = {
            "User-Agent": user_agent,
        }
        if download:
            accept = input("Do you want to download the image? (y/n): ")
            if "y" not in accept.lower():
                return None
        if not url.startswith("http"):
            raise ValueError("URL must start with 'http' or 'https'.")
        request = urllib.request.Request(url, None, headers)  # The assembled request
        response = urllib.request.urlopen(request)
        data = response.read()  # The data u need
        return PILModule.open(io.BytesIO(data)).convert("RGB")
    except Exception as e:
        logging.warning(f"Failed to load image from URL: {url}. {e}")
        logging.warning("Not validating the Image data")
        return None

open(path, encoding='jpeg', size=None) staticmethod

Opens an image from a file path.

Parameters:

Name Type Description Default
path str

The path to the image file.

required
encoding str

The format used for encoding the image when converting to base64.

'jpeg'
size Optional[Tuple[int, int]]

The size of the image as a (width, height) tuple.

None

Returns:

Name Type Description
Image Image

An instance of the Image class with populated fields.

Source code in mbodied/types/sense/vision.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
@staticmethod
def open(path: str, encoding: str = "jpeg", size=None) -> "Image":
    """Opens an image from a file path.

    Args:
        path (str): The path to the image file.
        encoding (str): The format used for encoding the image when converting to base64.
        size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

    Returns:
        Image: An instance of the Image class with populated fields.
    """
    image = PILModule.open(path).convert("RGB")
    return Image(image, encoding, size)

pil_to_data(image, encoding, size=None) staticmethod

Creates an Image instance from a PIL image.

Parameters:

Name Type Description Default
image Image

The source PIL image from which to create the Image instance.

required
encoding str

The format used for encoding the image when converting to base64.

required
size Optional[Tuple[int, int]]

The size of the image as a (width, height) tuple.

None

Returns:

Name Type Description
Image dict

An instance of the Image class with populated fields.

Source code in mbodied/types/sense/vision.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
@staticmethod
def pil_to_data(image: PILImage, encoding: str, size=None) -> dict:
    """Creates an Image instance from a PIL image.

    Args:
        image (PIL.Image.Image): The source PIL image from which to create the Image instance.
        encoding (str): The format used for encoding the image when converting to base64.
        size (Optional[Tuple[int, int]]): The size of the image as a (width, height) tuple.

    Returns:
        Image: An instance of the Image class with populated fields.
    """
    if encoding.lower() == "jpg":
        encoding = "jpeg"
    buffer = io.BytesIO()
    image.convert("RGB").save(buffer, format=encoding.upper())
    base64_encoded = base64lib.b64encode(buffer.getvalue()).decode("utf-8")
    data_url = f"data:image/{encoding};base64,{base64_encoded}"
    if size is not None:
        image = image.resize(size)
    else:
        size = image.size
    return {
        "array": np.array(image),
        "base64": base64_encoded,
        "pil": image,
        "size": size,
        "url": data_url,
        "encoding": encoding.lower(),
    }

save(path, encoding=None, quality=10)

Save the image to the specified path.

If the image is a JPEG, the quality parameter can be used to set the quality of the saved image. The path attribute of the image is updated to the new file path.

Parameters:

Name Type Description Default
path str

The path to save the image to.

required
encoding Optional[str]

The encoding to use for saving the image.

None
quality int

The quality to use for saving the image.

10
Source code in mbodied/types/sense/vision.py
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
def save(self, path: str, encoding: str | None = None, quality: int = 10) -> None:
    """Save the image to the specified path.

    If the image is a JPEG, the quality parameter can be used to set the quality of the saved image.
    The path attribute of the image is updated to the new file path.

    Args:
        path (str): The path to save the image to.
        encoding (Optional[str]): The encoding to use for saving the image.
        quality (int): The quality to use for saving the image.
    """
    if encoding == "png" and quality < 10:
        raise ValueError("Quality can only be set for JPEG images.")

    encoding = encoding or self.encoding
    if quality < 10:
        encoding = "jpeg"

    pil_image = self.pil
    if encoding != self.encoding:
        pil_image = Image(self.array, encoding=encoding).pil

    pil_image.save(path, encoding, quality=quality)
    self.path = path  # Update the path attribute to the new file path

space()

Returns the space of the image.

Source code in mbodied/types/sense/vision.py
436
437
438
439
440
def space(self) -> spaces.Box:
    """Returns the space of the image."""
    if self.size is None:
        raise ValueError("Image size is not defined.")
    return spaces.Box(low=0, high=255, shape=(*self.size, 3), dtype=np.uint8)