Name
cx:metadata-extractor — Extracts metadata from images and other files.
Synopsis
Input port | Primary | Sequence | Content types |
---|---|---|---|
source | ✔ | any |
Output port | Primary | Sequence | Content types |
---|---|---|---|
result | ✔ | ✔ | xml |
Option name | Type | Default value |
---|---|---|
assert-metadata | xs:boolean | false() |
properties | map(xs:QName,item()*)? | () |
Declaration
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
| <p:input port="source" content-types="any"/>
| <p:output port="result" content-types="xml" sequence="true"/>
| <p:option name="assert-metadata" as="xs:boolean" select="false()"/>
5 | <p:option name="properties" as="map(xs:QName,item()*)?"/>
|</p:declare-step>
Description
The cx:metadata-extractor
step returns an XML description of
the metadata associated with an image. For example, the metadata associated with
the image in
Figure 1, “Amaryllis” is shown in
Example 1, “Amaryllis metadata”.
1 |<c:metadata xmlns:c="http://www.w3.org/ns/xproc-step"
| href="amaryllis.jpg">
| <c:tag dir="Jpeg" type="0x0000" name="Data Precision">8 bits</c:tag>
| <c:tag dir="Jpeg" type="0x0001" name="Image Height">336 pixels</c:tag>
5 | <c:tag dir="Jpeg" type="0x0003" name="Image Width">500 pixels</c:tag>
| <c:tag dir="Jpeg" type="0x0005" name="Number of Components">3</c:tag>
| <c:tag dir="Jpeg" type="0x0006" name="Component 1">Y component: Quantization table 0, Sampling factors 1 horiz/1 vert</c:tag>
| <c:tag dir="Jpeg" type="0x0007" name="Component 2">Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
| <c:tag dir="Jpeg" type="0x0008" name="Component 3">Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
10 |</c:metadata>
In practice, the amount of metadata associated with an image is almost arbitrarily large. Example 2, “Digital Camera metadata”, for example, shows the metadata associated with an image from a digital camera.
1 |<c:metadata xmlns:c="http://www.w3.org/ns/xproc-step"
| href="file:/Volumes/Data/Pictures/2012/01/14/20120114-085508.jpg">
| <c:tag dir="Exif" type="0x010f" name="Make">Panasonic</c:tag>
| <c:tag dir="Exif" type="0x0110" name="Model">DMC-FS25</c:tag>
5 | <c:tag dir="Exif" type="0x0112" name="Orientation">Right side, top (Rotate 90 CW)</c:tag>
| <c:tag dir="Exif" type="0x011a" name="X Resolution">180 dots per inch</c:tag>
| <c:tag dir="Exif" type="0x011b" name="Y Resolution">180 dots per inch</c:tag>
| <c:tag dir="Exif" type="0x0128" name="Resolution Unit">Inch</c:tag>
| <c:tag dir="Exif" type="0x0131" name="Software">Ver.1.1 </c:tag>
10 | <c:tag dir="Exif" type="0x0132" name="Date/Time">2012-01-14T08:55:08</c:tag>
| <c:tag dir="Exif" type="0x0213" name="YCbCr Positioning">Datum point</c:tag>
| <c:tag dir="Exif" type="0x829a" name="Exposure Time">1/40 sec</c:tag>
| <c:tag dir="Exif" type="0x829d" name="F-Number">F5.6</c:tag>
| <c:tag dir="Exif" type="0x8822" name="Exposure Program">Program normal</c:tag>
15 | <c:tag dir="Exif" type="0x8827" name="ISO Speed Ratings">160</c:tag>
| <c:tag dir="Exif" type="0x9000" name="Exif Version">2.21</c:tag>
| <c:tag dir="Exif" type="0x9003" name="Date/Time Original">2012-01-14T08:55:08</c:tag>
| <c:tag dir="Exif" type="0x9004" name="Date/Time Digitized">2012-01-14T08:55:08</c:tag>
| <c:tag dir="Exif" type="0x9101" name="Components Configuration">YCbCr</c:tag>
20 | <c:tag dir="Exif" type="0x9102" name="Compressed Bits Per Pixel">4 bits/pixel</c:tag>
| <c:tag dir="Exif" type="0x9204" name="Exposure Bias Value">0 EV</c:tag>
| <c:tag dir="Exif" type="0x9205" name="Max Aperture Value">F3.3</c:tag>
| <c:tag dir="Exif" type="0x9207" name="Metering Mode">Multi-segment</c:tag>
| <c:tag dir="Exif" type="0x9208" name="Light Source">Unknown (4)</c:tag>
25 | <c:tag dir="Exif" type="0x9209" name="Flash">Flash fired, auto</c:tag>
| <c:tag dir="Exif" type="0x920a" name="Focal Length">14.2 mm</c:tag>
| <c:tag dir="Exif" type="0xa000" name="FlashPix Version">1.00</c:tag>
| <c:tag dir="Exif" type="0xa001" name="Color Space">sRGB</c:tag>
| <c:tag dir="Exif" type="0xa002" name="Exif Image Width">4000 pixels</c:tag>
30 | <c:tag dir="Exif" type="0xa003" name="Exif Image Height">3000 pixels</c:tag>
| <c:tag dir="Exif" type="0xa217" name="Sensing Method">One-chip color area sensor</c:tag>
| <c:tag dir="Exif" type="0xa300" name="File Source">Digital Still Camera (DSC)</c:tag>
| <c:tag dir="Exif" type="0xa301" name="Scene Type">Directly photographed image</c:tag>
| <c:tag dir="Exif" type="0xa401" name="Custom Rendered">Normal process</c:tag>
35 | <c:tag dir="Exif" type="0xa402" name="Exposure Mode">Auto exposure</c:tag>
| <c:tag dir="Exif" type="0xa403" name="White Balance">Auto white balance</c:tag>
| <c:tag dir="Exif" type="0xa404" name="Digital Zoom Ratio">Digital zoom not used.</c:tag>
| <c:tag dir="Exif" type="0xa405" name="Focal Length 35">80mm</c:tag>
| <c:tag dir="Exif" type="0xa406" name="Scene Capture Type">Standard</c:tag>
40 | <c:tag dir="Exif" type="0xa407" name="Gain Control">Low gain up</c:tag>
| <c:tag dir="Exif" type="0xa408" name="Contrast">None</c:tag>
| <c:tag dir="Exif" type="0xa409" name="Saturation">None</c:tag>
| <c:tag dir="Exif" type="0xa40a" name="Sharpness">None</c:tag>
| <c:tag dir="Exif" type="0xc4a5" name="Unknown tag (0xc4a5)">...</c:tag>
45 | <c:tag dir="Exif" type="0x0103" name="Compression">JPEG (old-style)</c:tag>
| <c:tag dir="Exif" type="0x0201" name="Thumbnail Offset">10740 bytes</c:tag>
| <c:tag dir="Exif" type="0x0202" name="Thumbnail Length">4153 bytes</c:tag>
| <c:tag dir="Exif" type="0xf001" name="Thumbnail Data">[4153 bytes of thumbnail data]</c:tag>
| <c:tag dir="Panasonic Makernote" type="0x0001" name="Quality Mode">2</c:tag>
50 | <c:tag dir="Panasonic Makernote" type="0x0002" name="Version">0 1 1 0</c:tag>
| <c:tag dir="Panasonic Makernote" type="0x0003" name="Unknown tag (0x0003)">1</c:tag>
| ...
| <c:tag dir="Panasonic Makernote" type="0x001c" name="Macro Mode">Off</c:tag>
| <c:tag dir="Panasonic Makernote" type="0x001f" name="Record Mode">Unknown (37)</c:tag>
55 | <c:tag dir="Interoperability" type="0x0001" name="Interoperability Index">Recommended Exif Interoperability Rules (ExifR98)</c:tag>
| <c:tag dir="Interoperability" type="0x0002" name="Interoperability Version">1.00</c:tag>
| <c:tag dir="Jpeg" type="0x0000" name="Data Precision">8 bits</c:tag>
| <c:tag dir="Jpeg" type="0x0001" name="Image Height">3000 pixels</c:tag>
| <c:tag dir="Jpeg" type="0x0003" name="Image Width">4000 pixels</c:tag>
60 | <c:tag dir="Jpeg" type="0x0005" name="Number of Components">3</c:tag>
| <c:tag dir="Jpeg" type="0x0006" name="Component 1">Y component: Quantization table 0, Sampling factors 1 horiz/2 vert</c:tag>
| <c:tag dir="Jpeg" type="0x0007" name="Component 2">Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
| <c:tag dir="Jpeg" type="0x0008" name="Component 3">Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
|</c:metadata>
The cx:metadata-extractor
step relies on
Drew Noakes’
Metadata Extractor
libraries. If the image identified is not a JPEG image, then the underlying Java
Image
properties are returned, if possible. For example, a PNG image
might return results such as those shown in Example 3, “PNG metadata”.
1 |<c:metadata xmlns:c="http://www.w3.org/ns/xproc-step"
| href="cover.png">
| <c:tag dir="Exif" type="0x9000" name="Exif Version">0</c:tag>
| <c:tag dir="Jpeg" type="0x0001" name="Image Height">480 pixels</c:tag>
5 | <c:tag dir="Jpeg" type="0x0003" name="Image Width">364 pixels</c:tag>
|</c:metadata>
The “Exif Version” of “0” is returned to identify this case.
Additional dependencies
In addition to the standard dependencies, the
cx:metadata-extractor
step relies on these libraries:
- metadata-extractor, version 2.19.0
To read metadata from images and other files.
- jaxb-api, version 2.3.1
Used when attempting to parse PDF bounding boxes.
- pdfbox, version 2.0.32
Used when attempting to parse PDF files.
- xmpbox, version 2.0.32
Used when attempting to parse XMP files.