Abstract
The POICE Image Processing Platform was developed to assist researchers of the POICE research project, mainly belonging to the Department of Biosciences, to easily process images collected with CPICS or Planktoscope imaging tools. Our task was to create a platform that seamlessly chains several separate processing steps, for the two separate modules/tools, while ensuring that the metadata for the processed images (stored in csv files) complies with Ecotaxa standards. An optional final step includes uploading the organized data to the Ecotaxa service. Moreover, the platform provides access to the entire processing workflow and the corresponding data to a specified set of users.
Background
Marine biology has traditionally relied on net sampling and subsequent laboratory-based sample processing employing microscopy. However, automated imaging systems are increasingly adopted in this field. Beyond their application in remote observations via satellites, these systems facilitate the processing of net samples to determine the plankton community composition and abundance of various samples. This approach significantly reduces the time allocated to preparing samples, manual counting and species identification using conventional microscopes.
Another category of in-situ instruments directly deployed into the marine environment eliminates the necessity for net sampling entirely. These systems continuously capture a small volume of water, enabling the detection of the temporal dynamics of specific taxa. For instance, the underwater microscope CPICS, employed at the AQUA section (Project POICE), captures five images per second of an approximately 1.5 cm2-sized area. Any object exceeding predefined quality criteria, such as sharpness and size, is captured and subsequently analyzed using a combination of manual classification and machine learning.
One limitation of automated imaging systems is the generation of substantial amounts of data that require processing, analysis, and storage. While image processing is feasible on consumer laptops and planktonoscopes, it often takes a significant time and is frequently limited by storage constraints. Consequently, there is a pressing need to organize and securely store large volumes of data and process the captured images. The tool developed by dSAG (Data Science Analytics Group at dScience) streamlines a substantial portion of this procedure and facilitates the use of the processing pipeline by individuals with limited technical expertise. Figure 1 shows an example of data uploaded to the Ecotaxa server using the POICE platform.

At UiO, the POICE project employs high-throughput imaging methods to investigate the spatio-temporal dynamics of zooplankton and the structuring role of parasite disease on zooplankton communities in coastal ecosystems. For more background information of this project, visit this website.
Methodology
The project was implemented as an open-source standalone application using Python. Some of the initial processing scripts were adapted from existing resources from the Planktoscope and CPICS tools. The Planktoscope imaging system comes with an internal interface and system to process the images it takes. The python scripts are available also to use on regular computers, which speeds up the image processing. The CPICS scripts were developed at UiO and extensively utilize the computer vision library OpenCV to extract particle characteristics from the stored CPICS images. To upload the resulting processed data and accompanying metadata (also generated in the pipeline) to the Ecotaxa server, we made use of the Ecotaxa API, which requires user to authenticate against an Ecotaxa project they have access to.
To integrate these separate processes/scripts (written in python), we first built a command-line interface (CLI) using the agrparse library. However, the use of the CLI requires familiarity with shell scripting. To enhance accessibility, a simple graphical user-interface (GUI) was developed using the Streamlit framework. Figure 2 illustrates a snapshot of the POICE. For more seamless integration, initial scripts that were written in R had to be ported over to python. As shown in Figure 3, the platform allows for direct connection and upload of data to Ecotaxa using the Ecotaxa API in the backend.

A dedicated project on the Educloud Research platform stablished, to make both raw and process data as well as the service available to selected users. The platform is containerized and running on a separate persistent virtual machine accessible from inside Educloud. The interface is available via a designated URL, with access two designated folders (one for CPICS, and one for Planktoscope) and its subfolders (for input and output). The whole service (GUI and CLI) was in principle developed in a way that it can be run on any machine.
