Future vehicular networks must ensure ultra-reliable low-latency communication (URLLC) for the timely delivery of safety-critical information. Previously proposed resource allocation schemes for URLLC mostly rely on centralized optimization-based algorithms and cannot guarantee the reliability and latency requirements of vehicle-to-vehicle (V2V) communications. This paper investigates the joint power and blocklength allocation to minimize the worst-case decoding-error probability in the finite blocklength (FBL) regime for a URLLC-based V2V communication network. We formulate the problem as a non-convex mixed-integer nonlinear programming problem (MINLP). We first develop a centralized optimization theory-based algorithm based on the derivation of the joint convexity of the decoding error probability in the blocklength and transmit power variables within the region of interest. Next, we propose a two-layered multi-agent deep reinforcement learning based centrally trained and distributively executed framework. The first layer involves establishing multiple deep Q-networks (DQNs) at the central trainer to train the local DQNs for block length optimization. The second layer involves an actor-critic network and utilizes the deep deterministic policy-gradient (DDPG)-based algorithm to train the local actor network for each V2V link. Simulation results demonstrate that the proposed distributed scheme can achieve close-to-optimal solutions with a much lower computational complexity than the centralized optimization based solution.