Multiplicity in the Partitioning of Signed Graphs

Nejat Arinik

Résumé

According to the structural balance theory, a signed graph is considered structurally balanced when it can be partitioned into a number of modules such that positive edges are located inside the modules and negatives ones are in-between them. In practice, real-world networks are rarely perfectly balanced. When it is not the case, one wants to measure the magnitude of the imbalance and to identify the set of edges related to the network imbalance. The Correlation Clustering (CC) problem is precisely defined as finding the partition with minimal imbalance. Signed graph partitioning is an important task, which has many applications, as finding a balanced partition helps understanding the system modeled by the graph. However, the standard approach used in the literature is to find a single partition and focus the rest of the analysis on it, as if it was sufficient to fully characterize the studied system. Yet, it may not reflect the meso-structure of the network, and one may need to seek for other partitions to build a better picture. Although this need to look for multiplicity is extremely important from the end user's perspective, only a very few works took it into consideration in their analysis, up to now. In this thesis, we want to relax this traditional single-partition assumption to allow searching for multiple partitions in two separate situations. The first one arises in the context of signed multiplex networks. All traditional approaches proposed to partition multiplex networks in general are based on the single-partition assumption. To overcome this limitation, we propose a new partitioning method that integrates a meta-clustering process before merging the partitions of individual layers, which allows identifying structurally similar layers. The second situation is specific to the CC problem. When solving an instance of such problem, several or even many optimal partitions may coexist. If multiple optimal partitions coexist, one can then wonder how different/diverse they are. Put differently, we want to know what we loose when considering only one partition, while there might be multiple ones. In order to answer these questions, one should ideally enumerate completely the space of optimal partitions, and perform its analysis. To this end, we propose a new efficient solution space enumeration method and a cluster analysis-based framework in order to first enumerate the space of optimal partitions and then empirically study such space. Lastly, each of these previous situations requires to compute the similarity between partitions. In the context of graph partitioning, this task can be done through a so-called external evaluation measure. However, there exist many such measures, each having different characteristics. This makes it challenging to select the most appropriate for a given situation for the end user. To this end, we propose a new empirical evaluation framework in order to produce results that end users can easily interpret. For a collection of candidate measures, it first consists in describing their behavior by computing them for a generated dataset of parametric partitions, obtained by applying a set of predefined parametric partition transformations. Second, our framework characterizes the measures in terms of how they are affected by these parameters and transformations.

Selon la théorie de l'équilibre structural, un graphe signé est structurellement équilibré s’il peut être partitionné en sous-groupes mutuellement hostiles (i.e. reliés seulement par des liens négatifs) tout en exhibant une solidarité interne (i.e. contenant uniquement des liens positifs). Mais un réseau réel (i.e. un graphe représentant un système du monde réel) est rarement parfaitement équilibré : on trouvera quelques liens positifs entre les groupes et/ou quelques liens négatifs à l’intérieur de certains groupes. L’un des défis du domaine est de quantifier le niveau de déséquilibre d’un tel réseau et d'identifier les liens qui causent ce déséquilibre. Le problème Correlation Clustering (CC) se définit précisément par l'obtention d'une partition possédant un déséquilibre minimal. Le partitionnement de graphes signés constitue une tâche importante du point de vue applicatif, étant donné que trouver une partition équilibrée aide à comprendre le système modélisé par le graphe signé. Cependant, l'approche standard dans la littérature se contente de chercher une seule partition, comme si elle caractérisait suffisamment le système étudié. Or, on peut avoir besoin de plusieurs partitions pour construire une image plus juste du système étudié. Même si cette notion de la multiplicité est extrêmement important du point de vue des utilisateurs finaux, elle a été très peu abordée dans la littérature. Dans cette thèse, on veut relaxer l'hypothèse de partition unique pour en chercher plusieurs, et ce dans deux situations séparées. La première concerne les graphes multiplexes signés. Toutes les approches traditionnelles proposées pour les graphes multiplexes produisent une seule partition à la fin. Pour pallier ce problème, nous proposons une nouvelle approche qui intègre un processus de clustering avant de fusionner les couches individuelles, ce qui permet de regrouper les couches structurellement similaires. La deuxième situation est spécifique au problème CC. Quand on résout une instance de ce problème, plusieurs partitions optimales peuvent coexister. La question qui se pose est de savoir ce qu'on perd, si on considère une seule partition optimale, alors qu'il en existe plusieurs. Idéalement, il faut les énumérer toutes avant de faire une analyse concluante. Pour ce faire, on propose une nouvelle méthode d'énumération et un framework basé sur l'analyse de clustering afin de d'abord complètement énumérer l'espace des partitions optimales, puis étudier empiriquement un tel espace. Dans les deux situations décrites ci-dessus, mesurer la similarité entre deux partitions est un point essentiel. Pourtant, il n'existe pas une définition de la meilleure mesure pour cela, étant donné que toute définition de la similarité est subjective et dépend notamment de l'application considérée. En conséquence, de nombreuses mesures ont été décrites dans la littérature. Cependant, l'abondance de mesures de similarité est une limitation plutôt qu'un avantage, car la sélection de la mesure la plus appropriée pour une application donnée devient un défi pour les utilisateurs finaux. Pour pallier ce problème, nous proposons un nouveau framework. Celui-ci prend en compte plusieurs paramètres liés au partitionnement, et évalue statistiquement les mesures dans plusieurs scénarios de partitionnement à travers ces paramètres, via l'analyse de régression.

Multiplicity in the Partitioning of Signed Graphs

Multiplicité dans le partitionnement de graphes signés

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager