- Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance Projection maintenance is one of the core data structure tasks. Efficient data structures for projection maintenance have led to recent breakthroughs in many convex programming algorithms. In this work, we further extend this framework to the Kronecker product structure. Given a constraint matrix {sf A} and a positive semi-definite matrix Win R^{ntimes n} with a sparse eigenbasis, we consider the task of maintaining the projection in the form of {sf B}^top({sf B}{sf B}^top)^{-1}{sf B}, where {sf B}={sf A}(Wotimes I) or {sf B}={sf A}(W^{1/2}otimes W^{1/2}). At each iteration, the weight matrix W receives a low rank change and we receive a new vector h. The goal is to maintain the projection matrix and answer the query {sf B}^top({sf B}{sf B}^top)^{-1}{sf B}h with good approximation guarantees. We design a fast dynamic data structure for this task and it is robust against an adaptive adversary. Following the beautiful and pioneering work of [Beimel, Kaplan, Mansour, Nissim, Saranurak and Stemmer, STOC'22], we use tools from differential privacy to reduce the randomness required by the data structure and further improve the running time. 4 authors · Oct 20, 2022
1 RUL forecasting for wind turbine predictive maintenance based on deep learning Predictive maintenance (PdM) is increasingly pursued to reduce wind farm operation and maintenance costs by accurately predicting the remaining useful life (RUL) and strategically scheduling maintenance. However, the remoteness of wind farms often renders current methodologies ineffective, as they fail to provide a sufficiently reliable advance time window for maintenance planning, limiting PdM's practicality. This study introduces a novel deep learning (DL) methodology for future RUL forecasting. By employing a multi-parametric attention-based DL approach that bypasses feature engineering, thereby minimizing the risk of human error, two models: ForeNet-2d and ForeNet-3d are proposed. These models successfully forecast the RUL for seven multifaceted wind turbine (WT) failures with a 2-week forecast window. The most precise forecast deviated by only 10 minutes from the actual RUL, while the least accurate prediction deviated by 1.8 days, with most predictions being off by only a few hours. This methodology offers a substantial time frame to access remote WTs and perform necessary maintenance, thereby enabling the practical implementation of PdM. 3 authors · Dec 9, 2024
1 Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when starting to train on a new task. Motivated by this 'stability gap', we propose that continual learning strategies should focus not only on the optimization objective, but also on the way this objective is optimized. While there is some continual learning work that alters the optimization trajectory (e.g., using gradient projection techniques), this line of research is positioned as alternative to improving the optimization objective, while we argue it should be complementary. To evaluate the merits of our proposition, we plan to combine replay-approximated joint objectives with gradient projection-based optimization routines to test whether the addition of the latter provides benefits in terms of (1) alleviating the stability gap, (2) increasing the learning efficiency and (3) improving the final learning outcome. 3 authors · Nov 8, 2023
26 Idempotent Generative Network We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely f(f(z))=f(z). The proposed model f is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the following objectives: (1) Instances from the target distribution should map to themselves, namely f(x)=x. We define the target manifold as the set of all instances that f maps to themselves. (2) Instances that form the source distribution should map onto the defined target manifold. This is achieved by optimizing the idempotence term, f(f(z))=f(z) which encourages the range of f(z) to be on the target manifold. Under ideal assumptions such a process provably converges to the target distribution. This strategy results in a model capable of generating an output in one step, maintaining a consistent latent space, while also allowing sequential applications for refinement. Additionally, we find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold. This work is a first step towards a ``global projector'' that enables projecting any input into a target data distribution. 6 authors · Nov 2, 2023 4
- VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping Maintaining an up-to-date map that accurately reflects recent changes in the environment is crucial, especially for robots that repeatedly traverse the same space. Failing to promptly update the changed regions can degrade map quality, resulting in poor localization, inefficient operations, and even lost robots. 3D Gaussian Splatting (3DGS) has recently seen widespread adoption in online map reconstruction due to its dense, differentiable, and photorealistic properties, yet accurately and efficiently updating the regions of change remains a challenge. In this paper, we propose VG-Mapping, a novel online 3DGS-based mapping system tailored for such semi-static scenes. Our approach introduces a hybrid representation that augments 3DGS with a TSDF-based voxel map to efficiently identify changed regions in a scene, along with a variation-aware density control strategy that inserts or deletes Gaussian primitives in regions undergoing change. Furthermore, to address the absence of public benchmarks for this task, we construct a RGB-D dataset comprising both synthetic and real-world semi-static environments. Experimental results demonstrate that our method substantially improves the rendering quality and map update efficiency in semi-static scenes. The code and dataset are available at https://github.com/heyicheng-never/VG-Mapping. 4 authors · Oct 10
- DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds Point clouds are naturally sparse, while image pixels are dense. The inconsistency limits feature fusion from both modalities for point-wise scene flow estimation. Previous methods rarely predict scene flow from the entire point clouds of the scene with one-time inference due to the memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation. To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids. Unlike the sampling operation commonly used in existing works, the dense 2D representation 1) preserves most points in the given scene, 2) brings in a significant boost of efficiency, and 3) eliminates the density gap between points and pixels, allowing us to perform effective feature fusion. We also present a novel warping projection technique to alleviate the information loss problem resulting from the fact that multiple points could be mapped into one grid during projection when computing cost volume. Sufficient experiments demonstrate the efficiency and effectiveness of our method, outperforming the prior-arts on the FlyingThings3D and KITTI dataset. 8 authors · Aug 8, 2023
1 Latent Space Factorisation and Manipulation via Matrix Subspace Projection We tackle the problem disentangling the latent space of an autoencoder in order to separate labelled attribute information from other characteristic information. This then allows us to change selected attributes while preserving other information. Our method, matrix subspace projection, is much simpler than previous approaches to latent space factorisation, for example not requiring multiple discriminators or a careful weighting among their loss functions. Furthermore our new model can be applied to autoencoders as a plugin, and works across diverse domains such as images or text. We demonstrate the utility of our method for attribute manipulation in autoencoders trained across varied domains, using both human evaluation and automated methods. The quality of generation of our new model (e.g. reconstruction, conditional generation) is highly competitive to a number of strong baselines. 5 authors · Jul 26, 2019
2 Preserving Linear Separability in Continual Learning by Backward Feature Projection Catastrophic forgetting has been a major challenge in continual learning, where the model needs to learn new tasks with limited or no access to data from previously seen tasks. To tackle this challenge, methods based on knowledge distillation in feature space have been proposed and shown to reduce forgetting. However, most feature distillation methods directly constrain the new features to match the old ones, overlooking the need for plasticity. To achieve a better stability-plasticity trade-off, we propose Backward Feature Projection (BFP), a method for continual learning that allows the new features to change up to a learnable linear transformation of the old features. BFP preserves the linear separability of the old classes while allowing the emergence of new feature directions to accommodate new classes. BFP can be integrated with existing experience replay methods and boost performance by a significant margin. We also demonstrate that BFP helps learn a better representation space, in which linear separability is well preserved during continual learning and linear probing achieves high classification accuracy. The code can be found at https://github.com/rvl-lab-utoronto/BFP 3 authors · Mar 25, 2023
4 GANeRF: Leveraging Discriminators to Optimize Neural Radiance Fields Neural Radiance Fields (NeRF) have shown impressive novel view synthesis results; nonetheless, even thorough recordings yield imperfections in reconstructions, for instance due to poorly observed areas or minor lighting changes. Our goal is to mitigate these imperfections from various sources with a joint solution: we take advantage of the ability of generative adversarial networks (GANs) to produce realistic images and use them to enhance realism in 3D scene reconstruction with NeRFs. To this end, we learn the patch distribution of a scene using an adversarial discriminator, which provides feedback to the radiance field reconstruction, thus improving realism in a 3D-consistent fashion. Thereby, rendering artifacts are repaired directly in the underlying 3D representation by imposing multi-view path rendering constraints. In addition, we condition a generator with multi-resolution NeRF renderings which is adversarially trained to further improve rendering quality. We demonstrate that our approach significantly improves rendering quality, e.g., nearly halving LPIPS scores compared to Nerfacto while at the same time improving PSNR by 1.4dB on the advanced indoor scenes of Tanks and Temples. 6 authors · Jun 9, 2023
- PanoLora: Bridging Perspective and Panoramic Video Generation with LoRA Adaptation Generating high-quality 360{\deg} panoramic videos remains a significant challenge due to the fundamental differences between panoramic and traditional perspective-view projections. While perspective videos rely on a single viewpoint with a limited field of view, panoramic content requires rendering the full surrounding environment, making it difficult for standard video generation models to adapt. Existing solutions often introduce complex architectures or large-scale training, leading to inefficiency and suboptimal results. Motivated by the success of Low-Rank Adaptation (LoRA) in style transfer tasks, we propose treating panoramic video generation as an adaptation problem from perspective views. Through theoretical analysis, we demonstrate that LoRA can effectively model the transformation between these projections when its rank exceeds the degrees of freedom in the task. Our approach efficiently fine-tunes a pretrained video diffusion model using only approximately 1,000 videos while achieving high-quality panoramic generation. Experimental results demonstrate that our method maintains proper projection geometry and surpasses previous state-of-the-art approaches in visual quality, left-right consistency, and motion diversity. 6 authors · Sep 14
- ZeroScene: A Zero-Shot Framework for 3D Scene Generation from a Single Image and Controllable Texture Editing In the field of 3D content generation, single image scene reconstruction methods still struggle to simultaneously ensure the quality of individual assets and the coherence of the overall scene in complex environments, while texture editing techniques often fail to maintain both local continuity and multi-view consistency. In this paper, we propose a novel system ZeroScene, which leverages the prior knowledge of large vision models to accomplish both single image-to-3D scene reconstruction and texture editing in a zero-shot manner. ZeroScene extracts object-level 2D segmentation and depth information from input images to infer spatial relationships within the scene. It then jointly optimizes 3D and 2D projection losses of the point cloud to update object poses for precise scene alignment, ultimately constructing a coherent and complete 3D scene that encompasses both foreground and background. Moreover, ZeroScene supports texture editing of objects in the scene. By imposing constraints on the diffusion model and introducing a mask-guided progressive image generation strategy, we effectively maintain texture consistency across multiple viewpoints and further enhance the realism of rendered results through Physically Based Rendering (PBR) material estimation. Experimental results demonstrate that our framework not only ensures the geometric and appearance accuracy of generated assets, but also faithfully reconstructs scene layouts and produces highly detailed textures that closely align with text prompts. 3 authors · Sep 27
- Project and Forget: Solving Large-Scale Metric Constrained Problems Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems. In this paper, we provide an active set algorithm, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We provide a theoretical analysis of Project and Forget and prove that our algorithm converges to the global optimal solution and that the L_2 distance of the current iterate to the optimal solution decays asymptotically at an exponential rate. We demonstrate that using our method we can solve large problem instances of three types of metric constrained problems: general weight correlation clustering, metric nearness, and metric learning; in each case, out-performing the state of the art methods with respect to CPU times and problem sizes. 2 authors · May 8, 2020
- Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh We present a pipeline for generating defurnished replicas of indoor spaces represented as textured meshes and corresponding multi-view panoramic images. To achieve this, we first segment and remove furniture from the mesh representation, extend planes, and fill holes, obtaining a simplified defurnished mesh (SDM). This SDM acts as an ``X-ray'' of the scene's underlying structure, guiding the defurnishing process. We extract Canny edges from depth and normal images rendered from the SDM. We then use these as a guide to remove the furniture from panorama images via ControlNet inpainting. This control signal ensures the availability of global geometric information that may be hidden from a particular panoramic view by the furniture being removed. The inpainted panoramas are used to texture the mesh. We show that our approach produces higher quality assets than methods that rely on neural radiance fields, which tend to produce blurry low-resolution images, or RGB-D inpainting, which is highly susceptible to hallucinations. 10 authors · Jun 5