SPAA 2023 Paper #92 Reviews and Comments
===========================================================================
Paper #92 Parallel Order-Based Core Maintenance in Dynamic Graphs


Review #92A
===========================================================================

Overall merit
-------------
2. Weak reject

Reviewer expertise
------------------
4. Expert

Paper summary
-------------
The paper presents a parallel CPU implementation of the Simplified ORDER-based core maintenance for dynamic graphs discussed in [12]. The main contributions of the paper are:
- Locking the vertices searching set instead of locking accessed edges
- Smarter locking mechanisms to avoid deadlocks
- Better speedups compared to state-of-the-art Traversal-based algorithms

Comments for authors
--------------------
Summary:
The paper discusses an interesting and important problem regarding updating dynamic graphs in parallel for the core maintenance problem. In general, the contributions made in this paper over the works of [11, 12] are incremental and shallow. The scalability of the proposed solution is not promising, as the solution starts to slow down after 16 workers. Lastly and most importantly, the significant speedup presented in this paper is due to using the sequential Simplified ORDER-based solution in [12] as the baseline, not due to the improvement from the proposed parallel solution of this paper.

Strength:
1. The paper gives a good summary of core maintenance algorithms for dynamic graphs
2. The paper shows significant speedups compared to state-of-the-art Traversal-based algorithms

Weakness:
1. Most of the discussed material is already discussed with better details in [11, 12]
2. Missing Algorithms should be referred back to [12] such as Forward and Backward in Alg 1 
3. The differences between the sequential and parallel insert/remove edges are the node lock and atomic operations, which seem to be a thin contribution.
4. Not fully parallel design either parallel insert or remove. This is an assumption made by the authors.
5. The graphs used in the experiment are fairly small (Only one graph has more than 100M edges)
6. The scalability of the solution is questionable since it slows down after using 16 workers.
7. For the largest graph (wiki-links, 130M edges), the traversal algorithm is almost the same for 1 and multiple workers. 

Typos:
1. [Abstract] When a burst of a large number of inserted or removed edges come --> comes
2. [Sec 3.3]  a condition-lock as in Algorithm 5 --> 3
3. [Figure 4] The x-axis is running time (millisecond) and the y-axis is the number of workers. --> x and y are swapped


Review #92B
===========================================================================

Overall merit
-------------
2. Weak reject

Reviewer expertise
------------------
3. Knowledgeable

Paper summary
-------------
This paper studies the problem of maintaining the coreness values
of every vertex in a dynamic graph. It gives order-based algorithms
for k-core where each worker handles a single edge insertion or
deletion and uses locks to perform its update work. This approach
allows it to potentially process batches of edge updates, but only
works well in practice when the edge updates are non-conflicting
(i.e., they increase the core numbers of disjoint sets of vertices).

The order-based algorithm is simple and elegant. Consider edge
insertion. Adding a single edge can increase the k-core of any vertex
in the graph by at most one. The idea of the order-based algorithm is
to store edges in the order they are peeled by a sequential peeling
algorithm. For each vertex v, direct the edges incident to v based on
this ordering to get a DAG. The main idea is to use the directed
out-degree of v to determine whether a vertex can increase its core
number after receiving an insertion. If after inserting an edge, its
directed out-degree is still <= coreness(v), then coreness(v) will not
change. On the other hand, if it exceeds coreness(v), v will jump up
to the next highest core and potentially pull some other vertices with
the same (previous) coreness up with it. The ordering based algorithm
also uses some pruning to remove vertices that cannot possibly
increase their core number from being processed. The algorithm for
removing edges works similarly to the forward algorithm.

This paper parallelizes the algorithm above. The main idea as I
understand it is to replace a sequential order-maintenance data
structure with a recently developed concurrent order-maintenance
structure, allow multiple threads to individually process separate
edge insertions (deletions). I had a hard time understanding the parallel
algorithm in this paper, which is a major reason for my recommending
rejection. One confusing aspect is that the
algorithms all use locks, and yet the algorithm analysis is done using
the work-depth model using best-case analysis. In particular, the
analysis makes assumptions about workers not being blocked by other
workers to derive depth bounds. The analysis should be done using
parallel primitives that do not require locking, or have provably-low
contention, e.g., filtering, sorting, semisorting, etc.  In the worst
case, the algorithm could offer no parallelism, which may not be
surprising given that the problem is P-complete statically.

Another concern is about the experiments. 
Overall, for an experimental paper the experimental section is too
short and could have investigated the proposed algorithm more
carefully. Some suggestions are to study the effect of batching, to
study the effect of hammer insertions/deletions (i.e., updates
performed incident to a single, or small small set of vertices), and
to evaluate the scalability with respect to fast static baselines (see
e.g., the Julienne paper [1] and the ParK paper [2]).

[1] https://dl.acm.org/doi/10.1145/3087556.3087580

[2] https://ieeexplore.ieee.org/abstract/document/7965211

Comments for authors
--------------------
- V+ not defined in the introduction. The definition given on page 2
  is also not precise: if V+ is a minimal set of vertices needed to
  find V*, won't we have V+ = V*?

- Comparison with a static parallel k-core algorithm?

- DAG (page 3) should be "directed acyclic graph"

- The caption for Table 2 should provide more details.

- Are insertions and deletions chosen with equal probability (page 8)?

- When using multiple threads, do different workers process different
  edges, or are they parallelizing the update work for a single edge
  update?

- Definition 3.7 would be easier to understand without a double negation.

- Page 3: it should say "we increment w'.d*_{in} by 1"

- In Figure 1, should cycle be circle?

- I didn't find Figure 2 to be very helpful as it just shows the core numbers
before and after the update, not how the update algorithm actually works.


Review #92C
===========================================================================

Overall merit
-------------
2. Weak reject

Reviewer expertise
------------------
4. Expert

Paper summary
-------------
This paper studies the exact k-core decomposition problem under batches of edge updates. They propose a new parallel algorithm based on the sequential ORDER algorithm which shows runtime improvements over previous state-of-the-art exact parallel k-core decomposition algorithms. They demonstrate their improvements via experiments on real-world graphs. 

Overall the paper proposes an interesting parallel batch-dynamic algorithm for exact k-core decomposition which improves upon previous methods experimentally. My main concerns about the paper are as follows: 

1) the paper should include proofs of correctness for the ORDER algorithm (Alg. 1 and 2) since it is central to their parallel algorithm,

2) some definitions are somewhat imprecise in the paper and the paper is sometimes difficult to parse due to inaccuracies and typos,

3) the experimental results are sometimes mixed (i.e. not always better than previous state-of-the-art). Such mixed results should be explained in more detail.

Comments for authors
--------------------
1. pg. 1: "sequentially algorithms" -> "sequential algorithms"
2. pg. 1: "traversing a possibly larger cope of vertices" -> what is a "cope" of vertices? should it be "core" instead?
3. pg. 1: "after for dynamic graphs" -> "for dynamic graphs"
4. pg. 1: "where the work is its sequential running time and the depth is its running time on an infinite number of processors" I wouldn't necessarily describe "work" and "depth" in this way. Instead, I would define work to be the total amount of computations performed by the algorithm. I would define depth to be the longest chain of sequential dependencies. 
5. pg. 1: "Many multi-core parallel batch algorithms for core maintenance have been proposed in [13, 14 , 29]." And under "Core Maintenance" in Related Works. There is another highly relevant related work that gives a batch-dynamic algorithm for approximate k-core decomposition that should be cited: 

Quanquan C. Liu, Jessica Shi, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2022. Parallel Batch-Dynamic Algorithms for k-Core Decomposition and Related Graph Problems. In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '22). 

6. pg. 2: "in which a hyteredge may contain one or more participating vertices" -> "hyperedge"
7. pg. 2: Definition of k-core. The k-core is an *induced* subgraph not just a "subgraph". This is a very important part of the definition. 
8. pg. 2: "propose a liner time" -> "linear"
9. pg. 2: Note that the linear time algorithm not only computes a k-core but also the k-core decomposition.
10. pg. 3: "the 𝑉^* only located in the 𝑘-subcore, where 𝑘 is the lower core number of two vertices that the inserted or removed edge connect" It is not clear what this sentence means.
11. Definition 3.5 is not well-defined because different orders of peeling can produce the same core decomposition. In other words, one sequence can cause u <= v while another sequence causes v <= u. One needs to specify how tie-breaking is done deterministically.
12. It is unclear what this sentence means: "A 𝑘-order ⪯ is an instance of all the possible vertex sequences produced by the BZ algorithm." As far as I know  ⪯ is only well-defined for any particular vertex sequence.
13. Definition 3.7 is difficult to parse. Namely, V^+ is not defined and "𝑣.d+
out is the total number of 𝑣’s successors without the ones that are confirmed not in 𝑉^∗"  is not equivalent to the expression at the end of Definition 3.7. There could be successors that are not in $V^+$ *and* are not in $V^*$; these successors are not shown in the expression at the end of Definition 3.7. 
14. Algorithm 1: K is never updated. The description says K is updated in Line 2 but it is not updated in Line 2 in the pseudocode for Algorithm 1. 
15. It is not clear to me why Algorithm 1 is correct. A proof of the correctness of the algorithm to be included in the appendix would be helpful. In particular, it is not clear to me why the case where w.d^*_{in} + w.d^*_{out} > K would result in the core number of w to be incremented by 1. It would also be helpful to include pseudocode for the Forward and Backward procedures. 
16. Definition 3.8 is an overly complicated way of stating the induced degree of v in the core defined by v's core number.
17. I do not understand this sentence: "When removing an edge (𝑢, 𝑣) such that 𝑣.core ≥ 𝑢.core, we have 𝑣.mcd off by 1." If this sentence is intended to say v.mcd decreases by 1 whenever such an edge (u, v) is removed, then this is not necessarily true since if u.core < v.core then v.mcd is not affected. 
18. pg. 4: Under the rebalance operation why is the difference compared to j^2 instead of j?
19. The parallel algorithms rely on many uses of locking which impacts scalability. It looks like from Fig. 4 that indeed scalability is impacted (i.e. at 64 cores, sometimes the program runs *slower* than at fewer cores). 
20. It looks like for a non-trivial number of graphs, your algorithms are not faster than JER and JEI (according to Table 2). This should be noted in the summary in Section 5.1 and an explanation for why this is so should be included. (The current explanation "The reason is that the special properties of graphs may affect the performance of different algorithms" is a bit vague.)


Rebuttal Response by Author [BIN GUO <guob15@mcmaster.ca>] (539 words)
---------------------------------------------------------------------------
Review A, "the significant speedup presented in this paper is due to using the sequential Simplified ORDER-based solution in [12] as the baseline, not due to the improvement from the proposed parallel solution of this paper": We are the first one to parallel the simplified ORDER-based algorithm. This is the main contribution of this work. So we only can choose the parallel TRAVERSAL algorithms as baselines. 

Review B, "One confusing aspect is that the algorithms all use locks, and yet the algorithm analysis is done using the work-depth model using best-case analysis": Our parallel algorithm adopts lock&unlock for synchronization. The running time depends on the distribution of inserted or removed edges.  It is true that in the worst case, e.g. all edges are inserted to the same vertex, the algorithm could be reduced to sequential and no parallelism (may not be P-complete and we need to prove this in future work). However, typically, the real data graphs are large and a batch of edge insertion and removal occurs with well-distributed vertices, which leads to possible parallelism.  For this aspect, as a better way, we evaluate the running time in experiments. 

Review B, "for an experimental paper the experimental section is too short and could have investigated the proposed algorithm more carefully": This paper contains both theory and experimental; maybe I choose the wrong paper classification. Due to page limitations, more experiments move to appendices.

Review B, " to evaluate the scalability with respect to fast static baselines (see e.g., the Julienne paper [1] and the ParK paper [2]).": All these two approaches focus on parallel core decomposition, not core maintenance. These two are different problems. In my work, I suppose that all the core numbers are initialized by core decomposition algorithms before core maintenance, which should not be evaluated in this work. 

Review B, "V+ = V*?": For edge inserton, we have |V+| > |V*|. But for edge removal, we have V+=V*. 

Review B, "When using multiple threads, do different workers process different edges, or are they parallelizing the update work for a single edge update?": First, each worker fetches a batch of edges from all target edges. Second, each worker handles its own edges. That means, different workers process different edges, as shown in Algorithm 5. 

Review C, "the paper should include proofs of correctness for the ORDER algorithm (Alg. 1 and 2) since it is central to their parallel algorithm": The correctness of the sequential ORDER algorithm is already well proved in [12]. So we omit it in our paper and mainly focus on the correctness of the parallel part.  But we can add the complete ORDER algorithm with the correctness proof part in the appendices. 

Review C, "Definition 3.7 is difficult to parse.": We omit the condition of vertices in V*. 

Review C, "pg. 4: Under the rebalance operation why is the difference compared to j^2 instead of j?": j^2 is to ensure that the OM data structure has amortized O(1) time for insertion, while j will destroy such O(1) time. 

Review C, " It looks like from Fig. 4 that indeed scalability is impacted (i.e. at 64 cores, sometimes the program runs slower than at fewer cores).": Yes, the contention will increase when using more workers.